ANALYSIS 


Cornelius Lanczos 


APPLIED 
ANALYSIS 


APPLIED 
ANALYSIS 


Cornelius Lanczos 


DOVER PUBLICATIONS, INC. 
New York 


This Dover edition, first published in 1988, is an unabridged and 
unaltered republication of the work first published by Prentice-Hall, Inc., 
Englewood Cliffs, New Jersey, in 1956. 


Library of Congress Cataloging-in-Publication Data 


Lanczos, Cornelius, 1893- 
Applied analysis / Cornelius Lanczos. 
p. cm. 
Reprint. Originally published: Englewood Cliffs, N.J. :Prentice- Hall, 
1956. 
Bibliography: p. 
Includes index. 
ISBN 0-486-65656-X (pbk.) 
1. Mathematical analysis. 2. Mathematical physics. 1. Title. 
88-3961 
CIP 


Manufactured in the United States by Courier Corporation 
65656X06 
www.doverpublications.com 


To the Memory of the Six Million 
Who Died for the Kiddush Hashem 


PREFACE 


FOR MANY YEARS the author has been engaged in studies of those 
fields of mathematical analysis that are of primary concern to the 
engineer and the physicist. That this area of “workable mathematics” 
did not receive the same attention during the 19th century as did the 
classical fields of analysis is perhaps the result of a historical mis- 
understanding. Until the time of Gauss and Legendre the “workable” 
methods of analysis received the closest attention of the best mathe- 
maticians. The brilliant discovery of the theory of limits changed the 
emphasis. Thenceforth it was considered satisfactory to design 
infinite approximation processes by which the validity of certain 
analytical results could be established, irrespective of whether the 
process used were feasible or not for a given problem. 

It was then that the gradual separation of “pure” and “applied” 
mathematics occurred until we now have the “‘pure analyst,” who 
pursues his ideas in a world of purely theoretical constructions, and 
the “numerical analyst,” who translates the processes of analysis into 
machine operations. 

In actual fact there is a large area between the two, which is not less 
analytical than the analysis of infinite processes but devoted to a 
different branch of analysis, namely, the analysis of finite algorithms. 
Here our objective is the analysis and design of finite processes which 
approximate the solution of an analytical problem. To design pro- 
cedures which will effectively minimize the error in a small number 
of steps and which will estimate the error with sufficient accuracy 
is not a matter of practical interest only, but a matter of scientific 
interest as well. This book is largely devoted to such problems. 

A few remarks concerning the manner of presentation may not 
be out of place. The author is not in favor of sacrificing rigor under 
the disguise that the applied scientist is interested only in the results 
and not in the more or less intricate procedures which lead to those 
results. The concepts and statements of mathematics are sharp 
and uncompromising and any “‘sloppy” presentation of a mathe- 
matical theorem disqualifies the formulation and throws doubt 
on the claimed result. It seems permissible, however, to state and 

Vv 


vi Preface 


prove a theorem under less exacting conditions than those in which 
the pure analyst is interested, if the gain achieved by this confinement 
is that the methods and results of mathematical investigations become 
presentable to the student of physics or scientific engineering in 
a language which is not overly strange to him. Furthermore, the 
author has the notion that mathematical formulas have their “‘secret 
life,” behind their Golem-like appearance. To bring out the “‘secret 
life” of mathematical relations by an occasional narrative digression 
does not appear to him a profanation of the sacred rituals of formal 
analysis but merely an attempt to a more integrated way of under- 
standing. The reader who has to struggle through a maze of 
“lemmas,” “‘corollaries,” and “theorems,” can easily get lost in 
formalistic details, to the detriment of the essential elements of the 
results obtained. By keeping his mind on the principal points he 
gains in depth, although he may lose in details. The loss is not 
serious, however, since any reader equipped with the elementary 
tools of algebra and calculus can easily interpolate the missing 
details. It is a well-known experience that the only truly enjoyable 
and profitable way of studying mathematics is the method of “‘filling- 
in details” by one’s own efforts. This additional work, the author 
hopes, will stir the reader’s imagination and may easily lead to 
stimulating discussions and further explorations, on both the 
university and the research levels. 

That a book of this nature cannot exhaust the subject without 
becoming unduly bulky, goes without saying. The broad subject 
of boundary-value problems, together with the theory of integral 
equations, had to be omitted, due to lack of space. But it is perhaps 
no exaggeration to say that the topics considered in each chapter 
are encountered almost daily by the engineer and physicist. A brief 
description of each chapter follows. 

Chapter I. Algebraic Equations. The search for the roots of an 
algebraic equation is frequently encountered in vibration and flutter 
problems and in problems of static and dynamic stability. Some 
useful computing techniques, based on the “movable strip method,” 
are discussed. The Bernoulli method with all its ramifications plays 
the central role, but the scanning of the unit circle for the separation 
of complex roots of nearly equal magnitude and the method of 
reciprocal radii for stability questions are likewise of interest. 

Chapter II. Matrices and Eigenvalue Problems. This chapter is 


Preface vii 


devoted to a systematic development of the properties of matrices, 
with particular emphasis on those features which are most frequently 
encountered in industrial research. 

Chapter III. Large-Scale Linear Systems. The advent of the 
electronic computer brings the iterative techniques for the solution of 
complicated boundary value problems and vibration problems into 
the foreground. This leads at once to the investigation of polynomial 
operations with matrices. While the general case of complex eigen- 
values could not be included, the “‘spectroscopic method” of finding 
the real eigenvalues of large matrices and the corresponding method 
of solving large-scale linear equations is of such general usefulness— 
and so naturally tied up with the later chapter on harmonic analysis— 
that their treatment was hardly out of place. An additional treatment 
of a perturbation problem gives at least a partial answer to the 
complex eigenvalue problem by showing how an arbitrary complex 
eigenvalue and eigenvector can be obtained if we can start with a 
fairly good first approximation of the desired eigenvalue. 

Chapter IV. Harmonic Analysis. The length of this chapter may be 
excused by the extraordinary importance of the Fourier series and its 
corollaries, the Fourier integral and the Laplace transform, in all 
problems of analysis. One might be tempted to paraphrase the 
famous saying of Victor Hugo that if he were asked to destroy all 
literature but keep one single book, he would preserve the Book of 
Job. Similarly, if we were asked to abandon all mathematical 
discoveries save one, we would hardly fail to vote for the Fourier 
series as the candidate for survival. This series has influenced the 
entire course of analysis, in both its theoretical and practical aspects, 
most profoundly. Moreover, its interconnection with other parts of 
analysis is so intimate that if we said “‘the Fourier series with all its 
implications,” a considerable part of our classical analysis would be 
preserved. 

For the purposes of engineering the orthogonality of the Fourier 
functions with respect to equidistant data is perhaps the most 
important single item. Accordingly the present chapter deals 
primarily with the interpolation aspects of the Fourier series, its 
flexibility in representing empirically given equidistant data with 
great ease. An important artifice is needed here to make the series 
applicable, viz., the subtraction of a linear trend which reduces the 
two boundary values to zero and permits the use of a pure sine series. 


vill Preface 


The remaining Gibbs oscillations, caused by the discontinuity of the 
second derivative, are too small to cause any harm. 

An additional artifice, the application of the ‘‘o-factors,”’ counter- 
acts the divergence-producing Gibbs oscillations if the series is 
differentiated. The same method smooths out the unpleasant Gibbs 
oscillations in the neighborhood of a discontinuity and in the 
representation of the delta-function. 

In electric network analysis the methods of the Fourier transform 
and the Laplace transform have gained enormous impetus during the 
last few years. These transforms are not only theoretical devices for 
proving certain basic theorems, but are of fundamental importance 
for the practical construction of the input-output relation of linear 
networks. The interpolatory solution of the filter problem and a 
variety of methods for the inversion of the Laplace transform are 
discussed as problems of applied analysis which have very real 
technical significance. Finally, the frequently encountered “‘search 
for hidden periodicities” is dealt with and a numerical scheme 
developed which achieves greater independence of the various 
frequencies—and thus higher resolution power and higher accuracy— 
than the traditional schemes. 

Chapter V. Data Analysis. The problem of the reduction of data 
and the problem of obtaining the first and even second derivatives 
of an empirically given function are constantly encountered in 
tracking problems, but also occur in similar form in ordinary curve- 
fitting problems. Two methods of smoothing are discussed: smooth- 
ing in the small and smoothing in the large. In the first case, local 
least-square parabolas are employed which lead to a certain weighted 
average of every observation with a few of its left and right neigh- 
bors. In the second case, a Fourier analysis is performed and the 
smoothing achieved by merely truncating the series at a judiciously 
chosen point. The latter technique provides us simultaneously with 
an analytical expression which represents all our data and inter- 
polates them at any desired point of the interval. If it so happens that 
a close polynomial approximation is desired, a method is described 
which transforms the Fourier series into a polynomial series of 
strongest convergence. We thus avoid the pitfalls of equidistant 
Lagrangian interpolation and obtain a polynomial which fits our 
data with small and practically uniformly distributed errors. 

Chapter VI. Quadrature Methods. Since the dawn of science the 


Preface ix 


problem of integration has fascinated the scholar. Each scientific 
period added its own share of knowledge to the problem of quad- 
rature. While Archimedes, who first introduced integration as an 
exact limit process, used only trapezoids with constantly decreasing 
sides for the purpose of quadrature, later ages refined the technique 
by operating with interpolating polynomials of higher order. The 
present chapter gives a survey of a variety of these methods. The 
Gaussian quadrature looms large as the most advanced of all 
quadrature methods. A slight modification of the method is described 
which makes it numerically more palatable by avoiding interpolation 
at irrational points. Moreover, the fundamental idea of the Gaussian 
method is translated to the case when only boundary values are at our 
disposal, viz., the value of the function and its derivatives up to a 
certain order at both endpoints of the interval. The resulting 
quadrature formula has strong convergence and can be used for the 
solution of boundary value problems and eigenvalue problems 
associated with ordinary differential equations. The method is 
demonstrated with the help of a few numerical examples. 

Chapter VII. Power Expansions. The representation of functions 
by polynomials is an old art. However, to represent a function within 
a given interval by a few powers but with small error goes beyond 
the realm of the Taylor expansion and requires the theory of ortho- 
gonal function systems. There is in particular one special class of 
polynomials, the “Chebyshev polynomials,” which assures stronger 
convergence than any other class of polynomials. In actual fact we 
go once more back to the Fourier series, since an expansion into 
Chebyshev polynomials is in reality nothing but a cosine series in a 
modified variable. 

We can put these polynomials to good use in the problem of solving 
ordinary differential equations with coefficients which are rational 
functions of x. Since most of the fundamental transcendentals 
encountered in mathematical physics are definable in terms of such 
differential equations, we obtain rapidly convergent expansions for a 
large variety of functions by a simple technique. We terminate our 
series from the beginning to a finite polynomial of a given order. This 
necessitates an error term on the right side of the given differential 
equation. This error term is put proportional to a Chebyshev 
polynomial of properly chosen order. We now obtain simple 
recurrence relations from which the coefficients of the approximating 


x Preface 


polynomial can be determined. We may have to repeat this process 
several times and obtain the final result as a linear superposition of 
the composing polynomials. This “‘r-method’’ is very helpful in 
putting the powers to work with maximum efficiency, whether our 
aim is to approximate an elementary function for operational 
purposes, or a transcendental function for evaluation purposes. The 
gain in reducing the error in comparison to that of the Taylor series 
(if the Taylor series exists at all) is always very considerable. 


Acknowledgments. This book has grown through years of scientific 
thinking. In these years the author had the good fortune of innumer- 
able discussions with colleagues and friends, which gave him the 
basic background on which to build. The number of people who 
helped the author in his endeavors is thus very great and their 
enumeration impossible. It will be more adequate to enumerate 
the institutions with which he was connected during the slow ripening 
of his thoughts. 

1. During the author’s memorable years at Purdue University 
(1931-45) Dr. W. Marshall of the Department of Mathematics and 
Dr. K. Lark-Horowitz of the Department of Physics organized a 
lecture course in “Approximation Methods of Analysis,” where the 
author first came in touch with approximation problems. It was 
during those years that the author discovered for himself the out- 
standing properties of the Chebyshev polynomials, which had a 
decisive influence on his later scientific development. 

2. During the national emergency the author spent the year 1943- 
44 at the Mathematical Tables Project, New York City, headed by 
Dr. A. N. Lowan, Director. The associations with an outstanding 
staff of numerical analysts were most gratifying. 

3. During 1944-45 the author gave two lecture courses: ‘“‘Engineer- 
ing Applications of Rapidly Converging Series” and “Approximation 
Mathematics Course,” under the auspices of the Physical Research 
Unit of the Boeing Airplane Company in Seattle, Washington. 
The excellently prepared mimeographed notes of these courses— 
compiled with the able assistance of Marius Cohn—form the basic 
core of the present book. 

In 1946 Dr. C. K. Stedman, Head of the Physical Research Unit of 
the Boeing Airplane Company, invited the author to join his unit as a 
mathematical consultant and research engineer. The benefits he 


Preface xi 


derived from the daily discussions with an unusually select group of 
excellent physicists and electrical engineers cannot be measured. 

4. At the invitation of Dr. J. H. Curtiss, then Chief, National 
Bureau of Standards, Washington, D. C. the author joined in 1949 
the newly founded Institute for Numerical Analysis at the University 
of California, Los Angeles. Under the leadership of Dr. Curtiss, the 
Institute provided a scholarly atmosphere and colleagual associations 
on a level which had few parallels anywhere in the world. The 
generosity with which the Institute supported the author’s scientific 
projects will remain in his memory with undiminished force. 

5. At the invitation of the Dublin Institute for Advanced Studies, 
the author spent a memorable year (1952-53) in Dublin, Eire, in 
daily contact with the directors of the Institute, Professor E. 
Schroedinger and Professor J. L. Synge. The quadrature formula of 
§ 22, Chapter VI, and its application to the solution of eigenvalue 
problems were developed during this year. 

6. During the winter 1953-54 the author was affiliated with North 
American Aviation, Los Angeles, as a staff member of the Tabulating 
Department of Charles F. Davis, numerical analyst. A lecture 
course attended by a selected group of engineers led to memorable 
friendships and is reflected in Chapters II and III of this book. 
The “‘spectroscopic eigenvalue analysis”? was developed during these 
months. 

In enumerating the extraordinary opportunities with which his 
good fortune endowed him in his scientific life, the author should not 
fail to mention the splendid assistance he received in the numerical 
documentation of his mathematical endeavors. For the computation 
of the numerical examples of the book he is primarily indebted to 
Miss Mary Ellen Russell, Research Assistant of the Physical Research 
Unit, Boeing Airplane Company, Seattle, Washington, while the 
Appendix Tables were principally prepared by Miss Lillian Forthal, 
INA, National Bureau of Standards, Los Angeles, California. 

C. L. 


Dublin Institute for Advanced Studies 
Dublin, Eire 


ee Oe ee 


N N N m m p č p | SS SS pd 
Se SOO A E a ea S 


CONTENTS 


INTRODUCTION 


Pure and applied mathematics ..........-.. 
Pure analysis, practical analysis, numerical analysis . . . 


Chapter I 
ALGEBRAIC EQUATIONS 


Historical introduction ... 1... 1. ee ee ee 
Allied fields . . a a 2 1 ee ee eee he te wees 
Cubic equations . . . . 1. ee ee ee ee 
Numerical example. ......... EER oe 
Newton’s method ......... +--+ ee ees 
Numerical example for Newton’s method ....... 
Horner’s scheme . . 1 2 6 5 6 ee ee ee ee 
The movable strip technique. . ........-.46. 
The remaining roots of the cubic. .......... 
Substitution of a complex number into a polynomial . . 
Equations of fourth order... ... 2.2... ee. 
Equations of higher order . . 2... 2.2.22 ee. 
The method of moments ...........+... 
Synthetic division of two polynomials. ........ 
Power sums and the absolutely largest root . ..... 
Estimation of the largest absolute value... ..... 
Scanning of the unit circle... . 2... 2... 2 ee 
Transformation by reciprocal radii... ....... 
Roots near the imaginary axis... ..... 2.2... 
Multiple roots . 2... 1 2 6 ee ee ee tee ee 
Algebraic equations with complex coefficients .... . 
Stability analysis . . . . 1. ee ee ee ee ee 


xiv 


SO. a A oe 


ph pb pam 
Nr OO 


pð 
w 


-r 
2 NDAMA 


U Bh N V BP N — 
PNRRARSKRASS 


Contents 
Chapter IT 
MATRICES AND EIGENVALUE PROBLEMS 

Historical survey . . sae Ge ee eS Se wa 49 
Vectors and tensors ............2.20084 51 
Matrices as algebraic quantities ........... 52 
Eigenvalue analysis. ..............4.. 57 
The Hamilton-Cayley equation. ........... 60 
Numerical example of a complete eigenvalue analysis . . 65 
Algebraic treatment of the orthogonality of eigenvectors 75 
The eigenvalue problem in geometrical interpretation . . 81 
The principal axis transformation of a matrix ..... 90 
Skew-angular reference systems ........... 95 
Principal axis transformation in skew-angular systems. . 101 
The invariance of matrix equations under orthogonal 

transformations ...............68.. 110 
The invariance of matrix equations under arbitrary linear 

transformations . . 2... 6. a ee ee ee ee 114 
Commutative and noncommutative matrices. .... . 117 
Inversion of a matrix. The Gaussian elimination method 118 
Successive orthogonalization of a matrix ....... 123 
Inversion of a triangular matrix .........4.. 130 
Numerical example for the successive orthogonalization of 

SAMAK) hk Ge ck, Se ee ee ee ee we B® 132 
Triangularization of a matrix. . . . 1... 1... ee, 135 
Inversion of a complex matrix .........4... 137 
Solution of codiagonal systems. ........... 138 
Matrix inversion by partitioning ........... 141 
Perturbation methods. . . . soa oaoa aà 143 
The compatibility of linear equations ......... 149 
Overdetermination and the principle of least squares . . 156 
Natural and artificial skewness of a linear set of equations 161 
Orthogonalization of an arbitrary linear system . . . . 163 


The effect of noise on the solution of large linear systems 167 


Bee ee ea 


pm 
> 


ot O a a 


Contents XV 


Chapter III 
LARGE-SCALE LINEAR SYSTEMS 
Historical introduction . . . . a soso a e a 171 
Polynomial operations with matrices ......... 172 
The p,q algorithm ~ e sosca 6 ee ee ee ee ee a 175 
The Chebyshev polynomials ...........-.. 178 
Spectroscopic eigenvalue analysis. .......... 180 
Generation of the eigenvectors... 2... ee eee. 188 
Iterative solution of large-scale linear systems .... . 189 
The residual test . 2. . . o. 1 e ee ee ee ee es 198 
The smallest eigenvalue of a Hermitian matrix. .... 200 
The smallest eigenvalue of an arbitrary matrix . .... 203 
Chapter IV 
HARMONIC ANALYSIS 

Historical notës- s eee Se SE OE wR a 207 
Basic theorems. 2-4 ae aS @ Se ee Sk ee 208 
Least square approximations ............ 211 
The orthogonality of the Fourier functions ...... 214 
Separation of the sine and the cosine series ...... 215 
Differentiation of a Fourier series .......... 219 
Trigonometric expansion of the delta function ..... 221 
Extension of the trigonometric series to the nonintegrable 

TUNCHONS: oem. bo a6 hg sae ie ee, Se ek LB 224 
Smoothing of the Gibbs oscillations by the o factors . . 225 
General character of the ø smoothing . ........ 227 
The method of trigonometric interpolation ...... 229 
Interpolation by sine functions. ........... 235 
Interpolation by cosine functions... ........ 237 
Harmonic analysis of equidistant data ........ 240 
The error of trigonometric interpolation. ....... 241 


Interpolation by Chebyshev polynomials ....... 245 


xvi 


17. 
18. 
19. 
20. 
21. 
22. 
23. 
24. 
25: 
26. 
2l 
28. 
29. 
30. 
31. 


pb pb : 
ee oe, O Oe ee a 


pò p pb pa 
Pie N 


Contents 
The Fourier integral . . . aa aa a 248 
The input-output relation of electric networks .... . 255 
Empirical determination of the input-output relation . . 259 
Interpolation of the Fourier transform ........ 263 
Interpolatory filter analysis ..........0202.. 264 
Search for hidden periodicities . . . ......202.2. 267 
Separation of exponentials ............., 272 
The Laplace transform .............0.. 280 
Network analysis and Laplace transform ....... 282 
Inversion of the Laplace transform... ....... 284 
Inversion by Legendre polynomials. . ........ 285 
Inversion by Chebyshev polynomials ......... 288 
Inversion by Fourier series ............. 290 
Inversion by Laguerre functions .........2.. 292 
Interpolation of the Laplace transform ........ 299 
Chapter V 
DATA ANALYSIS 
Historical introduction ..............., 305 
Interpolation by simple differences . . . ......., 306 
Interpolation by central differences . . . .... 2... 309 
Differentiation of a tabulated function ....... . 312 
The difficulties of a difference table. ......... 313 
The fundamental principle of the method of least squares 315 
Smoothing of data by fourth differences. ....... 316 
Differentiation of an empirical function... ..... 321 
Differentiation by integration ............ 324 
The second derivative of an empirical function. . .. . 327 
Smoothing in the large by Fourier analysis ..... . 331 
Empirical determination of the cutoff frequency . . . . 336 
Least-square polynomials .........2.2... a 344 
Polynomial interpolations in the large. . ....... 346 


The convergence of equidistant polynomial interpolation 352 


16. 
17. 
18. 
19. 
20. 
21. 


ee at ON OY ee ee a 


pd ph pb p 
wn ro 


bot peek ped 
2 ae 


NNN ee 
NFO! Sms 


Contents xvii 


Orthogonal function systems. . . . a s a a aa 358 

Self-adjoint differential operators. . .......2.. 362 

The Sturm-Liouville differential equation ....... 364 

The hypergeometric series . . 2... 1. ee ee ee 367 

The Jacobi polynomials... . 2... 2... 2 ew. 367 

Interpolation by orthogonal polynomials ....... 371 
Chapter VI 


QUADRATURE METHODS 


Historical notes «. . . 9-6. Sone & & ech aw we we we 4 379 
Quadrature by planimeters ............ . 380 
The trapezoidal tule s.u 3. 4% @% a 4%. 88-4 we 380 
SIMPSON STUE y ac S. Snel ee eee 381 
The accuracy of Simpson’s formula. . ........ 385 
The accuracy of the trapezoidal rule ......., . 386 
The trapezoidal rule with end correction ....... 386 
Numerical examples . ... 2... 2. ee eee 390 
Approximation by polynomials of higher order . . . . 393 
The Gaussian quadrature method .......... 396 
Numerical example. s s w e 1 e a waoe a ee ee 400 
The error of the Gaussian quadrature. ........ 404 
The coefficients of a quadrature formula with arbitrary 

ZOCOS ee Sr ee Se Sate Ate Ok he: Bee Sc eat A ae e a 407 
Gaussian quadrature with rounded-off zeros . . . .. . 408 
The use of double roots. . .......2..2.2468+4 410 
Engineering applications of the Gaussian quadrature 

MEO oh bb & S.-i w Eee MA 413 
Simpson’s formula with end correction ........ 414 
Quadrature involving exponentials .......... 418 
Quadrature by differentiation ..........2.. 419 
The exponential function .............. 425 
Eigenvalue problems .............4... 427 


Convergence of the quadrature based on boundary values 434 


xviii 


p paat 
= OoNUO DNDN PWN 


Contents 


Chapter VII 
POWER EXPANSIONS 


Historical introduction .............4.. 438 
Analytical extension by reciprocal radii . ....... 440 
Numerical example... ...........246. 444 
The convergence of the Taylor series ......... 447 
Rigid and flexible expansions .........4... 448 
Expansions in orthogonal polynomials ........ 451 
The Chebyshev polynomials ............. 454 
The shifted Chebyshev polynomials. ......... 455 
Telescoping of a power series by successive reductions . 457 
Telescoping of a power series by rearrangement . . . . 460 
Power expansions beyond the Taylor range ...... 463 
Ther method: 22s: to ge eee ee eS ew 464 
The canonical polynomials .........2.2... 469 
Examples for the r method ............. 474 
Estimation of the error by the 7 method ....... 493 
The square root of a complex number ........ 500 
Generalization of the 7 method. The method of selected 

POMS s a g e LA e e a i 504 
APPENDIX: NUMERICAL TABLES ....... 509 


BIBLIOGRAPHY 


The following books are recommended for collateral or more 
advanced reading. They represent a selective bibliography of books, 
written in English, whose topic is similar or related to that of the 
present book. If important sources have been overlooked, this did 
not occur intentionally. The Courant-Hilbert and Whittaker-Watson 
books are irreplaceable sources of information. In the field of 
numerical analysis the books of Milne, Scarborough, and Hartree 
are basic. 


{1} 
{2} 


{3} 
{4} 
{5} 
{63 
{73} 


{8} 
9} 
(10} 
{11} 


COURANT, R. and D. HILBERT, Methods of Mathematical 
Physics, Vol. 1 (Interscience Publishers, New York, 1953). 
DouerTy, R. E. and E. G. KELLER, Mathematics of Modern 
Engineering, Vols. I and II (John Wiley & Sons, Inc., New 
York, 1936 and 1942). 

Dwyer, P. S., Linear Computations (John Wiley & Sons, Inc., 
New York, 1951). 

HARTREE, D. R., Numerical Analysis (Clarendon Press, 
Oxford, 1952). 

HOUSEHOLDER, A. S., Principles of Numerical Analysis 
(McGraw-Hill Book Company, Inc., New York, 1953). 
JAEGER, J. C., Applied Mathematics (Clarendon Press, Oxford, 
1951). 

KARMAN, TH. v. and M. A. Biot, Mathematical Methods of 
Engineering (McGraw-Hill Book Company, Inc., New York, 
1940). 

MILNE, W. E., Numerical Calculus (Princeton University Press, 
Princeton, N. J., 1949). 

MURNAGHAN, T. D., Introduction to Applied Mathematics 
(John Wiley & Sons, Inc., New York, 1948). 

Pires, L. A., Applied Mathematics for Engineers and Physicists 
(McGraw-Hill Book Company, Inc., New York, 1946). 
SCARBOROUGH, J. B., Numerical Mathematical Analysis (Johns 
Hopkins Press, Baltimore, 1950). 


XIX 


XX Bibliography 


{12} ScHELKUNOFF, S. A., Applied Mathematics for Engineers and 
Scientists (D. Van Nostrand Company, Inc., New York, 1948). 

{13} SMITH, L. P., Mathematical Methods for Scientists and Engineers 
(Prentice-Hall, Inc., Englewood Cliffs, N. J., 1953). 

{14} WHITTAKER, E. T. and G. ROBINSON, The Calculus of Observa- 
tions (Blackie & Son, Glasgow, 1940). 

{15} WHITTAKER, E. T. and G. N. WATSON, A Course of Modern 
Analysis (Cambridge University Press, London, 1935). 


Books of a more specific character are listed at the end of each 
chapter. References in braces { } refer to the books of the general 
Bibliography, those in brackets [] to the books and articles of the 
chapter bibliographies. 


INTRODUCTION 


1. Pure and applied mathematics. The history of mathematics 
reveals that the interest in the formal processes of mathematics was 
seldom divested of the desire to obtain an adequate picture of 
the physical universe. The postulates and operations of analysis are 
not chosen arbitrarily, but are postulates and operations which map 
the geometrical order of things in the abstract realm of numbers. 
The formal operations of analysis are thus merely one link in our 
desire to discover the inherent functional order of the physical 
universe. The entire process involves three phases: 

l. A given physical situation is translated into the realm of 
numbers. 

2. By purely formal operations with these numbers certain 
mathematical results are obtained. 

3. These results are translated back into the world of physical 
reality. 

During the development of mathematics, Phase 2 of this translation 
process becomes eventually an independent endeavor in itself. We 
can concentrate on the formal processes of analysis without asking 
ourselves whether the problem posed has necessarily a counterpart 
in nature. Similarly, we can stop with the analytical result obtained 
in Phase 2 without interpreting it back into the physical order of 
things. 

The nineteenth century invented the terms “pure” and “applied” 
mathematics for the characterization of these two situations, a 
terminology which is far from being adequate and satisfactory. The 
term “pure mathematics” as an expression for isolation of the formal 
processes of mathematics has the connotation that this kind of 
intellectual endeavor is more “Part pour lart,” more “pure” and thus 
more “‘idealistic’’ than the other endeavor which keeps in closer touch 
with the inherent order of the physical universe. This connotation is 
hardly tenable and could have been avoided by a more precise 
terminology. Yet we continue with the usage of words, even if they 

1 


2 Introduction 


have been coined wrongly by conjuring up associations which are not 
warranted philosophically. The expressions “‘negative numbers” and 
“imaginary numbers” originate from a period which did not under- 
stand properly the true significance of these concepts. The names 
survive, however, and cause misunderstandings among those who do 
not pay close attention to the technical definition of these words. 
Similar is the situation relative to the misnomers ‘“‘pure’’ and 
“applied mathematics.”’ 


2. Pure analysis, practical analysis, numerical analysis. There 
exists, however, a somewhat different situation which again necessi- 
tates a distinctive terminology and where again the words “pure” and 
“practical? came into use without proper justification. Already the 
old Greek mathematicians of the fourth and third centuries B.c. 
recognized the fundamental fact that certain mathematical situations 
require an infinite sequence of never-ending approximations which come 
nearer and nearer to the desired result but without ever reaching it 
exactly. Archimedes discovered in his fundamental treatise on the 
circle that the circumference of a circle is not only not calculable 
exactly but not even definable exactly. The exact processes of algebra 
are thus replaced by another kind of approach termed by the Greeks 
the “method of exhaustion,” while nineteenth century mathematics 
adopted the name “‘infinite limit processes.” In these limit processes, 
frequently characterized by an infinite expansion, we do not endeavor 
to obtain a quantity, but merely to approach it with an error which can 
be made as small as we wish. “Absolute accuracy” is thus replaced 
by “arbitrarily great accuracy.” The first kind of accuracy, obtain- 
able in algebra but not obtainable in higher mathematics, means the 
“error zero.” The second kind of accuracy, typical for the processes 
of higher mathematics, means a ‘“‘finite but arbitrarily small error.” 

The great perspectives opened by the theory of limits of Cauchy 
and Gauss led to an overwhelming emphasis of infinite approximation 
processes. In these processes the number of steps employed is of no 
further consequence. We are not interested in what happens during 
a finite number of steps. We are satisfied if we know what happens 
eventually, if the number of steps is increased unlimitedly. 

Much less attention was paid to another kind of approximation 
process which is logically equally feasible. We may like to know what 
error is involved if we pursue a certain process a definite finite number 


§ 2 Pure, Practical, and Numerical Analysis 3 


of times. In particular we may be interested in designing approxima- 
tion processes which obtain a certain result with a minimum error 
for a given number of steps. The accuracy here is neither of the 
absolute nor of the arbitrarily great kind. We are reconciled to the 
fact that an error is committed. We should like to have tools for 
estimating this error. Moreover, we would like to investigate what 
processes can be pursued for effective reduction of this error. 

This branch of analysis, which received historically much less 
attention than the previously discussed limit processes, has not been 
designated by any adequate name. Since, however, the classical or 
“pure” analysis deals with infinite approximation processes, this 
other branch of analysis is sometimes referred to as “practical 
analysis.” The name comes from the fact that in physics and engineer- 
ing we are not interested in reducing the error of a certain approxima- 
tion process to zero or to an arbitrarily small amount, since our 
observations are of limited accuracy anyway. Hence we are satisfied 
with an answer which is “theoretically” imperfect but “practically” 
acceptable. Moreover, the theoretically perfect answer may involve 
tools which are too cumbersome to be available practically. Hence in 
practical analysis we are concerned with finite processes which are 
numerically accessible and which in a relatively few steps give an 
accuracy deemed satisfactory for certain aims of physics and 
engineering. 

Here again the juxtaposition of the words “‘pure”’ and “‘practical’’ 
conveys a wrong evaluation. One may be inclined to believe that only 
the processes of “pure” analysis are of distinctly mathematical 
interest, while the processes of “practical”? analysis are developed 
solely in view of the needs of the “‘practical’’ man, i.e., the physicist 
and engineer. This, however, is equivalent to putting an accidental 
feature of the historical development on an absolute pedestal. Let us 
consider as an example an infinite expansion of the number m which 
may require 100 terms for an accuracy of 1%. If another expansion 
of “practical analysis’? obtained the same accuracy with only five 
terms, we are justified in assuming that the second expansion is, 
even from the purely analytical angle, more adequate than the first 
very slowly convergent series. The increased convergence of the 
second series may be booked as a “practical”? achievement, but it 
may also be booked as a result of purely logical interest. | 

One can hardly deny that here is a branch of mathematical analysis 


4 Introduction 


which would deserve a more adequate name than the word “‘practical”’ 
with its utilitarian and pragmatic overtones. The present book makes 
a move in this direction by introducing a technical term for this class 
of analytical processes. The Greek word “parexic”’ (with the roots 
para = almost, quasi, and ek = out) means “nearby.” Hence the 
term “‘parexic analysis” can well be adopted to mean that we do not 
want an exact but only a “nearby” determination of a certain 
quantity. We can then speak of parexic methods, parexic expansions, 
parexic viewpoints, in contradistinction to the corresponding 
methods, expansions, and viewpoints of “pure analysis” which aim 
at arbitrary accuracy with the help of infinite processes. The author 
found that such a terminology is quite rewarding by obtaining a 
brief notation for something which under the customary terminology 
is expressible only by the way of circumlocution.? 

During the last few years a new word came into vogue: “‘numerical 
analysis.” This term is well suited to the designation of certain 
aspects of mathematical analysis which deal with translation of 
mathematical processes into operations with numbers. The basic 
viewpoint here is the ease with which certain analytical methods can 
be handled if the mathematical quantities are replaced by actual 
numbers; moreover, this branch of analysis is concerned with the 
accumulation of rounding errors, caused not by the approximate 
nature of parexic processes but by the approximate nature of the 
arithmetic processes of multiplications and divisions, if we restrict 
ourselves to a definite number of decimal places. 

We thus come to the conclusion that pure analysis, parexic 
analysis, and numerical analysis represent three well-circumscribed 
phases of mathematical investigations which have their own fields of 
interest and which pursue these interests with characteristic methods 
of their own. The present book is primarily devoted to the field of 
“‘parexic analysis,” but without losing sight of the general aims 
of pure analysis and the more arithmetical aspects of numerical 
analysis. 


1 This is the only instance in which the present book deviates from the 
customary terminology (or lack of terminology) of mathematics. The author is 
well aware that the coining of new words is the privilege of much more advanced 
minds. But in this particular instance an obvious emergency existed, and if the 
author’s suggestion is not accepted, at least he has called attention to the existence 
of this problem. 


I 
ALGEBRAIC EQUATIONS 


1. Historical introduction. Algebraic equations of the first and 
second order aroused the interest of scientists from the earliest days. 
While the early Egyptians solved mostly equations of first order, the 
early Babylonians (about 2000 B.c.) were already familiar with 
the solution of the quadratic equation, and constructed tables for the 
solution of cybic equations by bringing the general cubic into a normal 
form. 

The Hindus developed the systematic algebraic theory of the 
equations of first and second order (seventh century). The standard 
method of solving the general quadratic equation by completing the 
square is a Hindu invention. The Hindus were familiar with the 
operational viewpoint and were not afraid of the use of negative 
numbers, considering them as ‘“‘debts.” The clear insight into the 
nature of imaginary and complex numbers came much later, in the 
time of Euler (eighteenth century). 

The solution of cubic equations was first discovered by the Italian 
Tartaglia (early sixteenth century); Cardano’s pupil Ferrari added 
a few years later the solution of biquadratic equations. The 
essentially different character of equations of fifth and higher order 
was clearly recognized by Lagrange (late eighteenth century), but the 
first exact proof that general equations of fifth and higher order 
cannot be solved by purely algebraic tools is due to the Norwegian, 
Abel (1824), while a few years later (1832) the French Galois gave the 
general group-theoretical foundation of the entire problem. 

The “fundamental theorem of algebra” states that every algebraic 
equation has at least one solution within the realm of complex 
numbers. If this is proved, we can immediately infer (by successive 
divisions by the root factors) that every polynomial of the order can 

5 


6 Algebraic Equations Chap. I 


be resolved into a product of n root factors. The first rigorous proof 
of the fundamental theorem of algebra was given by Gauss when 
only 22 years of age (1799). Later Cauchy’s theory of the functions of 
a complex variable provided a deeper insight into the nature of the 
roots of an algebraic equation and yielded a simplified proof for the 
fundamental theorem. 

The existence of generally complex roots of an algebraic equation 
of nth order is in no contradiction to the unsolvability of an algebraic 
equation of fifth or higher order by algebraic means. The latter state- 
ment means that the roots of a general algebraic equation of higher 
than fourth order are not obtainable by purely algebraic operations 
on the coefficients (i.e., addition, subtraction, multiplication, division, 
raising to a power and taking the root). Such operations can 
approximate, however, the roots with any degree of accuracy. 


2. Allied fields. (a) The problem of solving an algebraic equation 
of nth order is closely related to the theory of vibrations around a state 
of equilibrium. The frequencies (or the squares of the frequencies) 
of a mechanical system appear as the “‘characteristic roots”? or 
“eigenvalues” of a matrix, obtainable by solving the “characteristic 
equation” of the matrix, which is an algebraic equation of the order n. 

(b) In electrical engineering the response of an electric network is 
always a linear superposition of exponential functions. The 
exponents of these functions are obtainable as the roots of a certain 
polynomial which can be constructed if the elements of the network 
and the network diagram are given. 

(c) Intricate algebraic and geometric relations frequently yield by 
elimination an algebraic equation of second or higher order for one 
of the unknowns. 


3. Cubic equations. Equations of third and fourth order are still 
solvable by algebraic formulas. However, the numerical computa- 
tions required by the formulas are usually so involved and time- 
absorbing that we prefer less cumbersome methods which give the 
roots in approximation only but still close enough for later refinement. 

The solution of a cubic equation (with real coefficients) is parti- 
cularly convenient since one of the roots must be real. After finding 
this root, the other two roots follow immediately by solving a 
quadratic equation. 


§3 Cubic Equations 7 


A general cubic equation can be written in the form 
fO=8+ a? + b&é—c=0 (1-3.1) 


The factor of ¢? can always be normalized to 1 since we can divide 
through by the highest coefficient. Moreover, the absolute term can 
always be made negative because, if it is originally positive, we put 
i = —é and operate with this &,. 

Now it is convenient to introduce a new scale factor which will 
normalize the absolute term to —1. We put 


e=at, doa, b= 27d, c= oC (1-3.2) 


and write the new equation 


fE) = e + qzr? + b2—c=0 (1-3.3) 
If we choose a = 1/~/c (1-3.4) 
we obtain gmd (1-3.5) 


Now, since f(0) is negative and f(00) is positive, we know that 
there must be at least one root between x = 0 and x = œ. We put 
x = l and evaluate f(1). If f(1) is positive, the root must be between 
Oand 1; if (1) is negative, the root must be between 1 and œo. More- 
over, since 


mty Xs = l (1-3.6) 


we know in advance that we cannot have three roots between 0 and 1, 
or l and œ. Hence, if f(1) > 0, we know that there must be one and 
only one real root in the interval [0,1], while if f(1) < 0, we know that 
there must be one and only one real root in the interval [1,00]. The 
latter interval can be changed to the interval [1,0] by the transforma- 
tion 


1 
po (1-3.7) 
x 


which simply means that the coefficients of the equation change their 
sequence: 


—¢,8+b,#+a¢+1=—0 (1-3.8) 


8 Algebraic Equations Chap. I 


Hence we have reduced our problem to the new problem: find 
the real root of a cubic equation in the range [0,1]. We solve this 
problem in good approximation by taking advantage of the remark- 
able properties of the Chebyshev polynomials (cf. VII, 9) which 
enable us to reduce a higher power to lower powers with a small 
error. In particular, the third Chebyshev polynomial 


T3 (x) = 3223 — 4822 + 18x — 1 (1-3.9) 
normalized to the range [0,1] gives 
48x? — 18 1 
z = = 1.52% — 0,5625a + 0.03125 (1-3.10) 


with a maximum error of + 4. The original cubic is thus reducible 
to a quadratic, with an error not exceeding 3%. 


We now solve this quadratic, retaining only the root between 
0 and 1. 


4. Numerical example. In actual practice « need not be taken 
with great accuracy but can be rounded off to two significant 
figures. Consider the solution of the following cubic: 


& + 1.26% — 17.04 — 70 = 0 (1-4.1) 


Barlow’s Tables give the cube root of 70 as 4.1212---, the reciprocal of 
which gives « = 0.2426:--. We conveniently choose 


a = 0.25 (1-4.2) 
obtaining 


f(x) = 2? + 0.32? — 1.06252 — 1.09375 = 0 (1-4.3) 


At x = 1, f(1) = —0.856 is still negative. The root is thus between 
x = 1 and œ. We invert the range by putting 


£ = l/x (1-4.4) 
1.09375z3 + 1.0625z? — 0.34 — 1 = 0 (1-4.5) 
The substitution (1-3.10) reduces this equation to the quadratic 


2.7037? — 0.915% — 0.966 = 0 (1-4.6) 


§ 4 Numerical Example 9 
solution of which gives 
0.915 + 3.370 


racine bes 1-4. 
ý 5.406 an 


The negative sign of the square root yields a spurious result, since it 
falls outside the range considered. The positive sign gives 


z = 0.79 (1-4.8) 
d th “= = 1.27 
an us Sano 
and 
1.27 
= —— = 5,08 1-4.9 
0.25 3 ( ) 


Substitution in the original equation shows that the left side gives 
the remainder 5.692, which in comparison with the absolute term 
70 is an error of 8%. 

The operation with large roots is numerically not advantageous. 
It is thus of considerable importance that we can always restrict 
ourselves to roots which are in absolute value less than 1, because if 
the absolute value of the root is greater than 1, the reciprocal 
transformation <= 1/zx, which merely inverts the polynomial, 
changes the root to its reciprocal. Hence in our example we will 
prefer to substitute the reciprocal of (9)}, i.e., 


E = 1/5.08 = 0.197 (1-4.10) 
into the inverted equation 
7023 + 178 — 1.2 -—1=0 (1-4.11) 


The remainder is now —0.0395, an error of 4% compared with the 
absolute term 1. 


1 Equations encountered in the current section are quoted by the digits after 
the decimal point only. Hence (9) refers to (1-4.9) since the equation has to be 
found among the equations of the current section. An equation in the current 
chapter, but not in the current section is quoted by Section number and equation 
digit. Hence (3.9) refers to (1-3.9). 


10 Algebraic Equations Chap. I 


5. Newton’s method. If we have a good approximation z = £o 
to a root of an algebraic equation, we can improve that approxima- 
tion by a method known as “‘Newton’s method.” We put 


£x = ta + h (1-5.1) 
and expand f(xọ + h) into powers of h: 


2 
fly + W) = flea) t EHE e+ 152 


For small } the higher order terms will rapidly diminish. If we neglect 
everything beyond the second term, then the solution of the equation 


fæ) =f(% +h) = 0 (1-5.3) 
is obtained in good approximation by 
f (%) 
h=- 5; 1-5.4 
° f Go) i ) 
We can now consider 
ti = t t h (1-5.5) 
as a new first approximation, replacing x by xı. Hence 
fE 
h=—-3 (1-5.6) 
; f(xy) 
combined with 
La = Xo + ho + h (1-5.7) 


is a still closer approximation of the root, and generally we obtain the 
iterative scheme 


=e FI En) 1-5.8 
"= — Fe) ae 
La = To th tee +h, (1-5.9) 


which converges rapidly to x, if 2 is a sufficiently close first 
approximation. | 

Newton’s scheme is not restricted to algebraic equations, but is 
equally applicable to transcendental equations. 


§ 6 Numerical Example for Newton’s Method l1 


An increase of convergence is obtainable if we stop only after the 
third term, considering the second-order term a small correction of 
the first-order term. Hence we write 


A h n 
f€) + Af’ (£o) + af (%o)) = 0 (1-5.10) 
and solve this equation in the form 
ee eae 


7, (1-5.11) 
f'E) + 5") 
replacing the h in the denominator by the first approximation (4). 
This yields a formula which can best be remembered in the following 
form: 
l f '(£o) l f " (zo) 
DES += 1-5.12 
h Je) * 2F'@) nen 


6. Numerical example for Newton’s method. In § 41 the cubic 
equation 
f(x) = 1.0937523 + 1.062522 — 0.3x — 1 =0 (1-6.1) 


was treated, and the approximation 


za = 0.79 (1-6.2) 

was obtained. We substitute this value in f(x) and likewise in 
f’ (Œ) = 3.281252? + 2.1252 — 0.297619 (1-6.3) 
and 4 f(x) = 3.28125x + 1.0625 (1-6.4) 


obtaining 
f (&o) = —0.034631, f'(£o) = 3.42658, 4 f"(x9) = 3.65468 (1-6.5) 
Substitution in the formula (1-5.12) gives 


] 
— = 98.945453 + 1.066568 = 100.012021 (1-6.6) 


h 
h = 0.009998798 (1-6.7) 
£, = t + h = 0.7999988 (1-6.8) 


1 Throughout the book the § sign refers to sections of the same chapter. 


12 Algebraic Equations Chap. I 


If this new 2, is substituted in f(x), we obtain 
f(e) = —0.00000418 (1-6.9) 


At this point we can stop, since the error is only 4 units in the 6th 
place; the coefficients of an algebraic equation are seldom given with 
more than 5 decimal place accuracy. 


7. Horner’s scheme. Direct substitution of a number into a 
polynomial is simple enough if the number is real and the polynomial 
is of low order. In view of later occasions, however, when poly- 
nomials of higher order have to be considered and the numbers to be 
substituted are complex, we will now discuss a numerically more 
elegant scheme, called “‘Horner’s scheme,” which obtains f(x), 
f’ (£), © by a process of synthetic divisions. 

We consider the algebraic division 


SE) — fo) 


T — To 


= fil”) (1-7.1) 


If f(x) is a polynomial of the order n, f(x) is a polynomial of the 
order n — 1. We can continue the process and gradually decrease 
the order of the resulting polynomial to zero: 


Ai) — fio) 


ea A (1-7.2) 
A S _ f2) (1-7.3) 
T — Ty 


What we accomplish by this successive decomposition of the given 
polynomial f(x) is that we automatically obtain the successive 
coefficients of the Taylor expansion about the point x = 2. Indeed, 
multiplying through our equations by x — x and making successive 
substitutions, we obtain 


fE) =f E) + foe — Xo) + folXo)(% — Xo)? + + (1-7.4) 


The coefficients of the Taylor expansion thus appear as the successive 


§ 8 The Movable Strip Technique 13 


remainders of a sequence of synthetic divisions. This process is known 
as “Horner’s scheme” [W. G. Horner (1819), also P. Ruffini (1804)]. 


8. The movable strip technique. In numerical work frequently a 
great deal of time is lost by noting down partial results which could 
have been avoided by a more concise arrangement of the calculations. 
One particular device which is of great help in many numerical 
algorithms, involves a “movable strip.” We formulate the algorithm 
of the movable strip in terms of desk calculations, but the technique 
can easily be coded for electronic computers, too. 

A certain fixed set of numbers is written down on a vertical strip of 
paper. This movable column operates on another given column, the 
“fixed strip.” The operation consists in multiplying two numbers 
facing each other, one of the movable strip and one of the fixed strip. 
The partial products are summed and the result written down on a 
third strip, the “nascent strip.” Now the movable strip glides 
vertically downward by one step, the operation is repeated, and the 
next element of the “nascent strip” is obtained. Thus we continue 
until the movable strip arrives at the bottom of the fixed strip. Hence 
we obtain an arrangement which is demonstrated in the following 
numerical scheme: 


Movable Fixed Nascent 

strip strip strip 
—3 
—2 

4 (1-8.1) 

l 2 2 

—3 5 

—5 —21 

0 —20 

] 20 


In this arrangement the sequence of operations was not decisive, 
since the results written down on the “nascent strip” have no 
influence on the later operations. The nascent strip of the scheme (1) 
could have been obtained by starting the movable strip from the 
bottom and moving upwards or in any other sequence. Frequently, 


14 Algebraic Equations Chap. I 


however, another kind of algorithm is encountered in which the 
nascent strip is put between the movable strip and the fixed strip. 
This is a “feed-back” arrangement which can be performed only in 
the right sequence. The movable strip now operates on the nascent 
strip, and only the lowest element of the movable strip (the 1 in the 
above example) reaches over to the fixed strip. The algorithm 
operates now as follows: 


Movable Nascent Fixed 


strip strip strip 
—3 
—2 
4 (1-8.2) 
1 2 2 
5 =3 
11 —5 
28 
t 76 1 


The first kind of arrangement will be encountered later in all 
procedures which involve the weighting of data, such as local 
smoothing, differentiation of an empirically given function, etc. 
(cf. V, 8, 10). But all problems associated with division of poly- 
nomials require the second kind of arrangement. 

As a simple application of the movable strip technique, let us 
consider the synthetic division of a given polynomial by x — 2. The 
movable strip will now contain but two elements, viz., 7, 1 in vertical 
arrangement. The fixed strip will contain the consecutive coefficients 
Ans An_1, ‘°°, Ag Of the given polynomial, likewise in vertical arrange- 
ment. The results are written down in succession between these two 
strips. 

Finally we arrive at the last element of our scheme, when the 1 of 
the movable strip reaches over to the ay of the fixed strip. If the 
result is 0, we know that x — zy is a root factor of the given poly- 
nomial, and the nascent strip contains the successive coefficients of 
the ratio p,(x)/(x — xo). If the result is not 0, we obtain the remainder 
of the division process. We do not write this last element in the nascent 
column but transfer it to a separate “remainder” column, filling out 
the last element of the nascent strip by zero. 


§9 The Remaining Roots of the Cubic 15 


We demonstrate this technique by obtaining the result of substitu- 
ting z = 0.79 into the cubic (1-6.1) on the basis of synthetic division: 


Movable Nascent Fixed 


: . Remainder: 
strip strip strip 
0.79 
l 1.09375 1.09375 (1-8.3) 
1.9265625 1.0625 
1.2219844 —0.3 
0 —] —0.03463234 


Horner’s scheme is obtained by repeating this algorithm again and 
again. In each instance the “nascent strip”? of the previous step 
becomes the “‘fixed strip” of the next step. For the sake of neater 
arrangement we will write the remainder in each instance at the 
bottom of the fixed strip in question. The complete synthetic 
division scheme of problem (3) now becomes 


1.09375 1.09375 1.09375 1.09375 
1.09375 2.790625 1.9265625 1.0625 
3.6546875 1.2219844 —0.3 
3.4265781 —1 
—0.0346323 


Hence 
f(0.79 + h) = 1.09375h® + 3.6546875h? + 3.4265781h — 0.0346323 


In actual practice we can frequently stop after three divisions, since 
f£). f (£o) and 4f" (xo) are sufficient for application of formula (5.12). 


9, The remaining roots of the cubic. In the present example 
formula (5.12) gave the improved root (6.8). We repeat the synthetic 
division with this new root, but do not go beyond the first step: 


1 1 
1.86909628 1.041667 (1-9.1) 
1.24892599 —0.297619 

— 1.033399 


—0.00000107 


16 Algebraic Equations Chap. I 


Since the remainder is already practically negligible, we have reduced 
the given cubic to the quadratic 


x? + 1.869096x + 1.248926 = 0 (1-9.2) 
which can be solved by the standard formula. We obtain 
£y = —0.934548 + 0.612818 
x = —0.934548 — 0.612818 


Going back to the original equation (4.1) by taking the reciprocals 
and dividing by 0.24, we finally obtain the three roots 


E = 5.035677 
E, = —3.117839 + 2.044483: (1-9.4) 
E, = —3.117839 — 2.044483: 


(1-9.3) 


10. Substitution of a complex number into a polynomial. Multi- 
plication of two complex numbers is much more cumbersome than 
multiplication of two real numbers. Hence the substitution of a 
complex number x, = a + ib into a polynomial is cumbersome even 
if synthetic division is used. The following method of substitution 
has the advantage that it reduces the operation with complex numbers 
to a minimum. 

In the ordinary synthetic division scheme we would divide f(x) by 
the root factor x — 2%) = x — a — ib, which involves the products of 
complex numbers from the beginning. Let us divide, however, by 
the real quantity (the notation “asterisk’’ refers to “conjugate 
complex’’): 

(x — x)(x — xp) = (x — a)? + b = x? — 2ax + (a + b?) (1-10.1) 
The result of the division can be written as follows: 


f(x) Ax +B 
je — xe — a) O + Gaye — ad) 
Now the remainder is not a mere constant but the linear term 
Az, + B. Multiplying through by the denominator on both sides we 
find 


(1-10.2) 


f(a) = Aty + B (1-10.3) 


This modification of the ordinary synthetic division scheme has thus 
the advantage that substitution of the complex number xo occurs 


§10 Substitution of Complex Number into Polynomial 17 


only at the end, and only in the form Ax. Up to that point all opera- 
tions are real. 

The movable strip is now composed of the numbers —(a? + 5%), 
2a, 1. The operations proceed once more in the previous fashion, 
with the only difference that the synthetic division is finished one 
step before the last coefficient of f(x) has been reached. Hence the 
process gives two remainder coefficients; the first is A, the second B. 
They are transferred to a separate “remainder” column to the right 
of the fixed column, while the last two elements of the nascent strip 
become 0. 

Example. Substitute the complex number x = 0.3 — 0.5i in 


f(x) = #4 — 22 + 4? — 2x + 1 (1-10.4) 
M.S. Quotient f(x) Remainder 
—0.34 
0.6 
1 
1 l (1-10.5) 
—1.4 —2 
2.82 4 


The remainder is thus 0.168z -+ 0.0412, and we obtain 
f(0.3 — 0.51) = 0.168(0.3 — 0.5i) + 0.0412 
= 0.0916 — 0.084: (1-10.6) 

However, Horner’s scheme is not applicable if this technique is 
adopted. We now have to form f'(x) and f ”(x) by actual differentia- 
tions and apply the process to these polynomials. 

In the present example, if we consider the given complex number as 
a preliminary root x) which shall be corrected by Newton’s method, 
the scheme continues as follows: 


f œ) $f"(x) 
4 4 
236, 26 6 6 (1-10.7) 


0 8 4,48 0 —6| —2.4 
0 —2 | —0.776 0 4 1.96 


18 Algebraic Equations Chap. I 


Hence 
f’ (£) = 4.48(0.3 — 0.51) — 0.776 = 0.568 — 2.24: 


4f” (£o) = —2.40(0.3 — 0.5i) + 1.96 = 1.24 + 1.2i 
Application of formula (5.12) gives 
1 1.24 + 1.2i 0.568 — 2.24 


h 0.568 —2.24i 0.0916 — 0.084: ne 
and we obtain h = —0.042909 — 0.029222; 
Hence the corrected root becomes 
2, = 0.257091 — 0.529222: (1-10.9) 
Substituting once more in f(x) yields 
| l 
— 1.485818 —2 (1-10.10) 
2.889847 4 
0 —2 0.000256 
0 1 | —0.000383 


The new remainder becomes 
f (£) = 0.000256z, — 0.000383 = —0.000318 — 0.000135; 


An estimation of the accuracy of the root z, can be obtained as 
follows. Since 2, is very near to 2, f’(x;) is only slightly different 
from f ‘(%p). 

The next correction in Newton’s method requires the evaluation of 
—f(x)/f (2%), which for estimation purposes can be replaced by 
—f(x,)[f (£o). Since the absolute value of f'(x) is more than 2, the 
remainder f(z) shows that the error of x, cannot be more than 1.5 
units in the fourth decimal. If we are satisfied with this accuracy, we 
can consider 

xı = 0.2571 — 0.5292 


as the final root. Then the coefficients of the division process (10) 
yield the reduced equation 


x? — 1.48582 + 2.8898 = 0 (1-10.11) 


the solution of which gives the other pair of complex roots. 


§ 11 Equations of Fourth Order 19 


11. Equations of fourth order. Algebraic equations of fourth 
order with generally complex roots occur frequently in the stability 
analysis of airplanes and in problems involving servomechanisms. 
The historical method of solving algebraic equations of fourth order 
(also called biquadratic or quartic equations) involves the following 
steps. By a transformation of the form x + « the coefficient of the 
cubic term is annihilated. Then an auxiliary cubic equation is 
solved. The roots of the original equation are constructed with the 
help of the three roots of the auxiliary cubic. Numerically this 
method is lengthy and cumbersome. The following modification of 
the traditional procedure yields the four roots of an arbitrary quartic 
equation with real coefficients on the basis of a quick and numerically 
convenient scheme. | 

Every equation of the form 


a + ¢,2° + cox? + cot +c, = 0 (1-11.1) 
can be rewritten as follows: 
(x? + ax + BP = (ax + b? (1-11.2) 


If the original c; are real, the new coefficients are also real. Hence 
the original equation becomes solvable in form of the quadratic 
equation 


x? + ax + B+ (ax+ b)=0 (1-11.3) 


which has four (generally complex) roots, obtainable by the standard 
formula. The new coefficients can be determined as follows. We 
evaluate in succession the following numerical constants: 


«=>, A=, — &, B=c,— aA (1-11.4) 


and form the cubic equation 
E3 + (24 — aE + (A? + 2Ba — 4c, — B = 0 (1-11.5) 


Since the left side is negative at = 0, a positive real root must exist. 
We determine this root according to the method of § 3. In order to 
avoid later corrections, it is advisable to add at this point Newton’s 


20 Algebraic Equations Chap. I 


correction (cf. § 5), obtaining & with great accuracy. The coefficients 
of the reduced equation (3) are then determined as follows:1 


a= 3, P=(A+ &) (1-11.6) 


a B 
a=vé b=$ (2-2) 


Numerical example. We will demonstrate the general procedure by 
solving the following quartic equation: 
zt + 7.6423 + 23.6044a2 + 38.91024x + 38.149496 = 0 
Here 
Cy = 7.64, co = 23.6044, c, = 38.91024, cy = 38.149496 
and we obtain by substituting in the general equations (4): 
x = 3.82, A = 9.012, B= 4.4844 
The cubic equation (5) becomes: 
& +. 3.43160042? — 37.121024¢£ — 20.109843 = 0 


We make the substitution 7 
= —— 


0.37 
obtaining 
n? + 1.26969272 — 5.081868% — 1.018624 = 0 
Since the left side is still negative at 7 = 1, we invert the equation by 
going to the reciprocal root (we divide by 1.018624): 


WP + 4.98895472 — 1.2464787 — 0.981717 = 0 


Y 
1.5 —0.5625 + 0.03125 


 6.48895472 — 1.8089787,— 0.950467 = 0 
72 — 0.27887, — 0.1465 = 0 
ip = 0.1394 + 0.1658 = 0.5466 


1 The exceptional case B = 0 deserves special attention. Then (5) has the 
solution = 0 and we obtain by a limit process: 
a=4, B=44, a=0, b=4VA?—4ey 
If, however, A? — 4c, happens to be negative, it is preferable to divide (5) by 
and solve the resulting quadratic for the positive root. This is then substituted 
in the general formulas (6); (with B = 0). 


§ 11 Equations of Fourth Order 21 


We will correct this root by Newton’s method, using the synthetic 
division scheme of Horner (cf. §7 and §8), here displayed in 
horizontal arrangement: 


1 4.988954 -— 1.246478 — 0.981717 
1 5.535554 1.779256 — 0.009176 
1 6.082154 5.103761 
1 6.628754 
= I, SS saa 
i : + 1.2988 
557.5064 
h = 0.001794 
0.5466 
7 = 0.548394 


ij = 0.548394, 7 = 1.823506, § = 4.928394 
Substitution in the equations (6) yields 
x = 3.82, B= 6.970197, a= 2.219998, b = 3.230196 
Hence the reduced quadratic equation becomes 
a? + 3.822 + 6.970197 + (2.219998x + 3.230196) = 0 


(a) x2 + 6.039998 + 10.200393 = 0 
x = —3.019999 + 1.039230i 
(correct: —3.02 + 1.039230487) 
(b) x2 + 1.600002x + 3.740001 = 0 
x = —0.800001 + 1.760681; 
(correct: —0.8 + 1.76068169i) 
Great accuracy results, however, even if we do not correct the 
preliminary value 7, but accept it as 7. Then, — 
j = 0.5466, n = 1.8295, & = 4.9446 
and substitution in (6) now gives 
x = 3.82, ß = 6.9783, a= 2.2236, b = 3.2388 
which leads to the reduced equation 
x2 + 3,822 + 6.9783 + (2.2236x + 3.2388) = 0 


22 Algebraic Equations Chap. I 
The four roots of this equation are 
x = —3.0218 + 1.04203 and x = —0.7982 + 1.7614: 


A good check on the accuracy to be expected is available by forming 
the product of all four roots. This should be equal to c,. In our case 
the first set of roots gives 38.149461 (correct: 38.149496). The 
second set of roots gives 38.2081. 


12. Equations of higher order. Newton’s correction scheme is 
an important tool in the gradual evaluation of the roots of an 
algebraic equation, if we can start with a crude approximation. 
Unfortunately, we are not in the possession of any direct methods 
for approximate localization of the roots of equations of higher than 
fourth order. The real roots of an algebraic equation can be found 
with relatively little labor. For this purpose we divide the interval 
between —1 and +! in, let us say, 10 equal parts, evaluating f(x) at 
intervals of 0.2. We observe the change of sign and localize the root 
more exactly by linear interpolation. The synthetic division scheme 
of § 7 will then refine this root to the desired accuracy. The roots 
outside the interval —1 to +1 are now transformed inside by the 
reciprocal transformation (3.7), i.e., by inverting the sequence of the 
coefficients, and once more we scan the same interval in units of 0.2. 
Hence with relatively little labor all the real roots of a polynomial 
can be located. 

However, a general polynomial need not have any real roots, and 
the polynomials encountered in vibration and stability problems 
are usually exactly of this type. Hence it is of importance that there 
exist a method, called the “method of moments,” first described by 
Daniel Bernoulli (1728), which puts us in the position to locate the 
absolutely largest root of an algebraic equation of any order with 
comparative ease. 


13. The method of moments. If we resolve a polynomial of 
nth order into its root factors: 


fE) = ar + az" +--+ 4, 
= a(x — 2,)\(x — z3) | (x — z,) (1-13.1) 


and differentiate logarithmically, we obtain a formula which is 


§ 13 The Method of Moments 23 


particularly useful in the study of the algebraic behavior of a 
polynomial: 
"(x I I 
Le 1, 


I 
4- re 4- z — z (1-13.2) 


Let x = 2, be a point which is not far from the root x = 2,. Then 


f (£o) oe l 


f) zı — To Za — X% 


pipi mS 
Ln — To 


and we see that Newton’s method of evaluating the correction of 25 
from the formula 


l f (o) 


ho f (0) 
amounts to reducing the right side to the first term. This will be 
the more justified, the nearer zo is to the true root z. 
Let us put in particular x) = 0 and obtain 
fO 1 l 1 
myy a a a witeoen 1-13.5 

O mt” Ts, — 
If it so happens that one root, say x}, is much nearer to the origin than 
all the other roots, then 


(1-13.4) 


1__ £') 


h fO 


will give a close approximation of that root. But this approximation 
loses increasingly in value as the closeness of x, to the origin becomes 
less pronounced. 

Let us now differentiate the function (2) m — 1 times. We thus 


obtain 
irene. ror" = ee ee eee ee ee 
im — 1)! he “Gog!  @ooe 


and putting x = 0: 


ee O rere oe : 
-a Sel... gee gem: aS 


By this method we put the spotlight on the nearest root 2, with 


(1-13.6) 


24 Algebraic Equations Chap. I 


increasing sharpness, even if the closeness of z is not very pronounced 
in the beginning. 
We can thus generalize formula (6) to 


1 1 [PQ 
hm (m — nif. =0 i 


and obtain a good approximation of the nearest root x, by choosing m 
sufficiently high. 

This general idea can be elaborated to a valuable method for 
locating the nearest root of an algebraic equation. The actual 
differentiation of the function (2) would be a cumbersome task. We 
can obtain, however, the quantities (9) by a simple division scheme 
which generalizes the synthetic division method of § 8 and § 9. 

It is preferable to think of the roots of the inverted polynomial 
which are the reciprocals of the original roots. In these terms the 
quantities (8) become 


Sna = 27 -+ ag +--+ 2% (1-13.10) 


They are called the “symmetric moments of the roots” but the name 
“power sums” is also frequently used. In the reciprocal plane the 
previously nearest root changed to the most remote root. 

Formula (9) is now converted into 


h” = oh + + a” = Sa (1-13.11) 


The following paragraph gives a simple and elegant numerical scheme 
for successive generation of the moments Sn. 


14. Synthetic division of two polynomials. The movable strip 
technique can be successfully employed for generation of the 
symmetric root functions S,,. Let us write the coefficients of the 
polynomial A(z) in a vertical column, starting with the highest 
coefficient and ending with the lowest (“fixed strip”). Moreover, let 
B(x) be a polynomial whose highest coefficient is normalized to 1. 
We write the coefficients of B(x) on a movable strip, starting with 1 
and moving upward. The sign of each coefficient is reversed, with the 
exception of the highest coefficient 1 which remains unchanged. 

The movable strip technique now generates the ratio of the two 
polynomials A(x) and B(x). 


§ 14 Synthetic Division of Two Polynomials 25 
Example. Divide 626 — 32° + 23 — x? by z? + 227 + 4a — 1. 


B(x) Quotient A(x) Remainder 
1 6 6 
—4 —15 —3 
—2 6 0 (1-14.1) 
l 55 l 
0 —1 —150 
0 0 | —214 
0 0 55 
} 


Hence 

6x8 — 325 + 23 — r? 

x? + 2x? + 4x — 1 
— 1502? — 214” + 55 
4 2x? + 4x — 1 


If this technique is applied to the ratio f’(x)/f(x) we have the 
difficulty that the numerator is of the order n — 1, the denominator 
of the order n. We make the division possible by multiplying the 
numerator by an arbitrarily high power x”. Then the scheme can 
continue indefinitely, giving a polynomial of the order N — 1. If 
now we divide by x”, we obtain the quotient in the form 


f'@) _ % 

f@) «x 

where the coefficients co, c,, Ca, *** are the successive entries of the 
“quotient” column. 

On the other hand, the expansion of (13.2) in reciprocal powers of 


= 623 — 1§72 + 67 + 55 + 


a : 
al 72 T 3 T (1-14.2) 


x gives 
IRT E TENE S 
fæ sM — zje 1 — z,/x 
Sa es cease, 
S Pp Stee tt (1-14.3) 


The comparison of (2) and (3) shows that 
Cm = Sm (1-14.4) 


26 Algebraic Equations Chap. I 


The movable strip technique thus provides the successive power sums, 
up to any order m. 
Example. Obtain the successive power sums of the polynomial 


xt + 7.6013 + 23.3422 + 38.442 + 37.40 (1-14.5) 


fo) 
m Sn F'O 
0 4 4 
1 —7.60 22.80 
2 | 11.08 46.68 
3 | 22.144 38.44 
4 4 52.2312 (1-14.6) 
5 — 616.6382 
6 4015.4799 


15. Power sums and the absolutely largest root. The behavior of 
the moments of high order allows conclusions concerning the 
absolute value of the most remote root. Let us write the generally 
complex roots in polar form: 


{= r,e (1-15.1) 
Then 
Sm = r” em, | 1 sa (2 J emaon + e. t (= J" coment 
ry ri 
(1-15.2) 


If r, is the most remote root and all the other r, are smaller than 7, 
then the ratios r,/r,(i = 2, + n), raised to an increasingly high power 
converge to zero, and thus for very large m, 


Sorre 
and 
m 
n=V| Sn] (1-15.3) 


The successive power sums have thus the valuable property that 
they single out the absolutely largest root with ever-increasing strength. 


§15. Power Sums and the Absolutely Largest Root 27 


If the roots of an algebraic equation are of slightly different orders of 
magnitude, this difference becomes greatly magnified in the power 
sums of high order. For example, the ratio 1.5 : 1 is increased in 
ten steps to the ratio (1.5)° : 1 = 57.7. The largest root will thus 
greatly overshadow all the other roots. It is possible, however, that 
more than one root will have the same absolutely largest distance r, 
from the origin. In fact, if the algebraic equation has real coefficients, 
we know in advance that the complex roots always appear in pairs 
a + ib and a — ib, and thus at least two roots lie on the maximum 
circle r = r}. The successive power sums S,, then show preference 
for one pair of roots. Generally in the S,, of high order only the 
absolutely largest roots will be practically present, while the absolutely 
smaller roots are practically obliterated. If the absolutely largest 
root is real, it is quickly obtainable by the ratio of two successive S,, : 


Std 
C= S, (1-15.4) 
The mere observance of a few consecutive S,, helps to spot this 
situation. The sign of the S,, is then either constantly positive or 
constantly alternating. Moreover, the ratio (4) remains approxi- 
mately the same if two neighboring ratios Syi3/Sm and Sinso/Smsa 
are used. 

In the more frequent case of complex roots such a regularity in the 
size and sign of the S,, cannot be observed. A large value may be 
followed by a very small value, and the sign may change capriciously. 
A behaviour of this kind indicates that the absolutely largest root is 
complex and of the form a + ib. The associated S,, is now of the 
form 


Sin = 2r™ cos m8 (1-15.5) 


The last factor is responsible for the irregular changes in sign and 
size. However, if the maximum circle contains only one pair of 
complex roots, we can again succeed with approximate localization 
of the largest root. We consider the determinant equation 


1 A A vee Qn 
1 a2 at e 


—0 (1-15.6) 


eee n 
n Xn 


28 Algebraic Equations Chap. I 


which is valid for A= 2, 2,,°-2,. If this determinant is pre- 
multiplied by the determinant 


l 0 0 i 0 
0 w, Wo “OW, 
0 wmz WoXo we Wot 
O watt werkt oe w, 
(1-15.7) 
we obtain the equation 
1 A i? eee ie 
Wo Wy Wo ERS Wy 
w Wo Wg nae Writ = 0 (1-15.8) 
On- On-2 Wniy | Weon-1 
where the w, denote the “weighted power sums” 
Ww, = w xk + work + + wack (1-15.9) 


Now we can generate the equation (8) in successive steps, starting 
with the quadratic equation 


1 A # 
Wo Wy, Wo = 0 (1-15.10) 


Wy We W3 


and constantly adding two more rows and columns of the original 
determinant. In this successive procedure each new step generates 
one additional pair of roots and at the same time corrects the roots 
previously obtained. Generally we cannot expect that these correc- 
tions will remain small. But this is actually the case if it so happens 
that the weights w}, Wa," are strongly biased in favor of the absolutely 
largest roots. If the weights w,,w. dominate in comparison to the 
other weights, then the solution of the simple quadratic equation (10) 
will already approximate the absolutely largest pair of roots and the 
later phases of the process merely add small corrections. 

Let us now assume that the absolutely largest complex root 
outdistances the others by a reasonably large factor, e.g., 1.5 or 
more. We consider four consecutive power sums S,,, belonging to 


§ 15 Power Sums and the Absolutely Largest Root 29 


the subscripts m, m + 1, m + 2, m + 3, where m is not below 9. 
These four S,, can be considered the four w, of equation (10): 


Wy = W, Hwa +" + Wy 
W, = Witi + Wot, + °° + W,2,, (1-15.11) 
Ws = Wx? + wees +o + wee 
Wg = Wy} + Wot +o + Wnty 

with Wea. Wama oy We Sz, (1-15.12) 


The conditions for applicability of the simple quadratic (10) are thus 
fulfilled. We form the ratios 


Sm S m42 DES 
$, = : = = - 1-15.13 
7 Sm41 Sint i ( ) 


and solve the quadratic equation 


T A PR 
Ss l1 Sy = 0 (1-15.14) 
l S Sz | 


which means 
(1 — 5455)A2 + (153 — Sa)A + (83 — 53) = 0 (1-15.15) 


The (usually complex) roots of this equation establish the absolutely 
largest root with fair accuracy. In the case of real roots we keep only 
the larger of the two roots. 

We demonstrate the method numerically by applying it to the 
example of § 14. If the movable strip technique of the table (1-14.6) 
is continued up to m = 12, the last four power sums become 


So = 61829.6 Sı = —0.31 103 
Sio = — 198790.6 hence Sa = —2.91902 
Sı = 580274.3 S = 7.55682 


Sy. = — 1502225 
and the guadtate equation (15) becomes 
0.092097A? + 0.56862A + 0.96386 = 0 
A = —3.087 + 0.9673 


30 Algebraic Equations Chap. I 


The correct root of (14.5) is 
x = —3 + i 


and we see that the error is not more than 3%. Hence we obtained a 
root which is close enough to the true value to make Newton’s 
method (cf. § 5) applicable. 


16. Estimation of the largest absolute value. The assumption that 
the absolute value of the largest root will be at least 1.5 times as large 
as the next largest root will not always be true. We have to be 
prepared for the possibility that two or even more pairs of complex 
roots will lie nearly on the same circle of the complex plane. In the 
extreme case all the complex roots of an algebraic equation may 
have nearly the same absolute value. Even in such cases the power 
sums S,, contain valuable information concerning the location of the 
absolutely largest roots. The radius of the maximum circle can be 
ascertained with sufficient accuracy, even if two or more pairs of 
roots lie near to that circle. Hence we can proceed as follows. We 
first obtain a close value for the absolute value r of the largest pair or 
pairs of roots. Then we proceed to localization of the angle 6—or 
several angles 0,—associated with the complex roots which lie near 
to the maximum circle. For this second half of the problem two 
procedures will be considered. One is based on the properties of 
“hidden periodicities.” while the other uses a transformation known 
as the “transformation by reciprocal radii.” 

The factor cos m8 in the expression (15.5) interferes with a simple 
determination of r on the basis of the power sums S,,. In the absence 
of this factor we could obtain r with the help of the equation 


1 | Sin | 
log r = = log "2 (1-16.1) 
With the proper precaution this equation can still be used, in spite of 
the disturbing factor. No matter what @ is, it will inevitably happen 
that m0 comes near to a certain multiple of 7. If this happens, the 
factor cos mô will be near to +1. We can watch out for such 
opportunities by spotting the peak values of S,,. Starting with 
m = 10 we can take the logarithm of | S, | and divide by m. We go 
up until m = 16 and select the maximum of these ratios. For this 


maximum we form (1) and obtain r. This method is applicable even 


§ 16 Estimation of the Largest Absolute Value 31 


if more than one pair of complex roots happen to be near the maxi- 
mum circle. Even an error of 100% in Sie, for example, caused by 
the presence of a second pair of complex roots, will not vitiate r by 
more than 4%, since the sixteenth root of 2 is 1.0443. Hence it will 
always be possible to obtain a reasonably accurate value for the 
radius of the maximum circle, without going to an unduly large order 
in the tabulation of the S,,. 

A numerical example will help in demonstrating the general 
procedure. We choose an equation of sixth order whose root 
factors are known. The following sixtic equation: 


& — 26 + 6&4 — 58 + 6336? — 200F + 2500 =0 (1-16.2) 
has three pairs of conjugate complex roots, located as follows: 
€=42i, —34+4, 443i 


In harmony with our general policies, we first normalize the absolute 
term to the order of magnitude 1. For this purpose we divide the 
coefficients of the equation by the successive powers of 4, according 
to the substitution 

E = 4x (1-16.3) 


This yields the new equation 
xë — 0.52° + 0.3752 — 0.906252 
+ 2.47266x? — 0.193512 + 0.61035 =0 (1-16.4) 


The movable strip technique displayed in § 14 yields in succession 
the following power sums: 


6, 0.5, —0.5, 2.28125, —8.10939, —5.623069, 
—0.0312454, — 11.299692, 10.068455, 20.171028, (1-16.5) 
—0.001920, 32.925676, 7.659814, 


The peak value S,, = 32.925676 leads to a particularly large r and 
will thus be retained; the other S,, are discarded. We divide by 2 
and take the logarithm. 


log ro = 14 log 16.463 = 0.1106 
This gives ro = 1.290 


which is a satisfactory approximation of the correct r = 1.25. 


32 Algebraic Equations Chap. I 


17. Scanning of the unit circle. A complex number 
z= re 


is characterized by magnitude and direction. Even if r is already 
in Our possession, we still have to find the angle 0 associated with a 
complex root. If several roots are located in the vicinity of the 
maximum circle, all the corresponding 0, will have to be found. 
Moreover, we will try to improve on the preliminary value of r, 
found by the previous peak value method. 

Our first move will be to change the maximum circle to the unit 
circle by the scale transformation 


This transformation has the consequence that the moments related 
to the new variable u are equal to the old S„ divided by rọ. In our 
numerical example we found for rọ the value 1.29, which can be 
rounded off to 1.3. Hence we divide the S,, values of the table (16.5) 
by the successive powers of 1.3, taken from Barlow’s Tables. This 
gives the following new table, if we terminate our sequence with S},: 


6, 0.38462, —0.29586, 1.03835, —2.83932, —1.51445, 
0.00647, — 1.8008, 1.2343, 1.90212, —0.00014, 1.83720, 
0.32877 (1-17.2) 


Now, assuming that the new maximum circle is exactly 1 (which is 
in fact true only in approximation), the successive moments associated 
with the maximum root become 


2, 2cos0, 2cos20, 2cos30, + 2 cosmô 


If more than one root lies on the unit circle, each root will be 
associated with such a consequence, and an arbitrary S,, becomes 


Sm = 2 cos mh, + 2 cos m0, +°-+2cosm6@, (1-17.3) 


In practice p will seldom exceed 2 or 3, since the order of the equation 
will seldom exceed 6. However, for the sake of discussion, p can be 
left arbitrary. 


§ 17 Scanning of the Unit Circle 33 


Now S,, can be conceived as the value of a certain function S(t) 
of the continuous variable ¢, at the definite points t = m. We define 
this function as follows: 


S(t) = 2 cos ĝt + 2 cos Oot + = + 2cos0,t (1-17.4) 


We notice that S(t) is composed of purely periodic components. 
Hence the search for the roots of an algebraic equation can be 
reformulated as a search for “hidden periodicities” of a function 
which is given in equidistant intervals. We will deal with this problem 
later in detail (cf. IV, 22). In the general problem each periodic 
component has its own amplitude, phase, and frequency. In our 
case the amplitude is fixed to 2 in advance; moreover, the “‘phase”’ 
aspect of the problem is irrelevant, since only cosine functions enter 
our considerations. It is the frequency w, in which we are primarily 
interested. The various w; of the periodic components correspond to 
the unknown angles 6,, 05, ---, 6, of the roots. 

We anticipate the results of the later investigation and apply them 
to our present problem. It will be shown later how beneficial the 
“ø smoothing” is in cutting down the otherwise cumbersome “Gibbs 
oscillations.” The application of the o factors causes extra labor by 
multiplying each S,, by a pretabulated factor. However, the focusing 
power of the method is so strongly increased because of this 
smoothing procedure—by diminishing the mutual interference of 
the various roots on the unit circle—that the additional work is well 
justified. We will not go beyond S,,. Moreover, the weight factor of 
Sig becomes 0. Hence only 11 multiplications are involved because 
of the o factors which are tabulated as follows: 


m Om m Om 
o | 1. 7 | 0.52708 
1 | 0.98862 8 | 0.41350 (1-17.5) 
2 | 0.95493 9 | 0.30010 
3 | 0.90032 10 | 0.19099 
4 | 0.82699 11 | 0.08987 
5 | 0.73791 12 | 0 
6 | 0.63662 


The table (2) of the S,, values is thus once more modified. We 
multiply them in succession with the corresponding o,, factors of the 


34 Algebraic Equations Chap. I 


table (5). The results Sm of the multiplication are arranged in the 
following order. We start with S,, = 6 but we divide by 2, and thus 
write down 3. We write next in the same horizontal row Si, So, + 
until we come to Sẹ. Then we continue in the line below, but now 
going backward. Hence S; is lined up with S;, Sg with S4, +, finally 
4Sio = 0 with 455 = 3. Then we form the sums and differences of 
these two lines. In our numerical example the arrangement looks as 
follows : 


Lvs 


, 0.38024, —0.28252, 0.93484, —2.34812, —1.11650, —0.00412 
0, 0.16511, —0.00003, 0.57084, 0.51037, —0.94828, ool 


Sum: 3, 0.54535, —0.28255, 1.50568, —1.50568, —2.06478, —0.00412 
Diff.: 3, 0.21453, —0.28249, 0.36400, —2.85849, —0.16822, (1-17.6) 


Since the half circle is divided into 12 parts, we are going to scan the 
unit circle in intervals of 15°. First, however, we do the scanning in 
intervals of 30°. We multiply the “sum” row by a cosine matrix 
[cf. (4-13.3)] which in fact coincides with the A, matrix of IV, 13. 
This matrix has pretabulated coefficients and we obtain the following 
scheme. 


k |3 0.54535 —0.28255 1.50568 —1.83775 —2.06478 —0.00412 
0/1 l 1 1 1 I 1 
211 0.866 0.5 0 —0.5 —0.866 —1 
4\1 0.5 —0.5 —1] = O05 0.5 1 
6j|1 0 =] 0 1 0 = 
8 | 1 —0.5 —0.5 l —0.5 —0.5 1 
10} 1 — 0.866 0.5 0 —0.5 0.866 —1 
12} 1 —] J =l 1 —1 1 


0.86183 6.04209 1.79063 1.44892 6.32142 1.64511 0.88933 


The last line gives the resultant product of the top row with the 
successive rows of the matrix. At once we notice two well-pronounced 
maxima which belong to k = 2 = 60° and k = 8 = 150°, thus 
indicating the existence of two complex roots near the unit circle. 
We will now refine our scanning by reducing the fundamental interval 
to 15°. It is now the “‘difference’’ row of the table (6) which comes 
into action. The previous multiplication matrix is replaced by a new 
preassigned matrix. We need not carry out the complete multiplica- 
tion scheme but only the multiplications which will give us the two 
neighbors of the maxima found before. Hence in our problem the two 


§ 17 Scanning of the Unit Circle 35 


products for k = 1,3 will be needed, and likewise the two products 
for k = 7,9. The numerical scheme appears as follows: 


| 3 0.21453 —0.28249 0.36400 —2.85849 —0.16822 
1 1 0.9659 0.866 0.7071 0.5 0.2588 
3 1 0.7071 0 —0.7071 —1 — 0.7071 
5 1 0.2588  —0.866 —0.7071 0.5 0.9659 
7 1 —0.2588  —0.866 0.7071 0.5 —0.9659 
9 1 —0.7071 0 0.7071 a 0.7071 
11 1 — 0.9659 0.866 — 0.7071 0.5 — 0.2588 
1.74718 5.87175 2.17974 5.84523 


The two maxima are treated quite independently. First we 
concentrate on the maximum at k = 2 and its two neighbors. The 
three consecutive ordinates are : 


Yo = 1.74718, yı = 6.04209, Y = 5.87175 


In order to interpolate for the position of the correct maximum, we 
lay a parabola of second order through these three ordinates and 
determine the point where the maximum occurs. The solution of this 
simple algebraic problem gives the following result. The abscissa of 
the maximum occurs at 


Lm 


1 Y2 — Yo 
= - nA (1-17.7) 
2 2y — (Yo + Yo) 


while the associated maximum ordinate becomes 
Ym = Yı + 32 m(Yo — Yo) (1-17.8) 
In our numerical problem 


ae 1 4.12457 
™ 24.46525 


Ym == 6.04209 + 4 0.46185 - 4.12457 = 6.51832 


= 0.46185 


Formula (7) assumes the labeling —1, 0, 1 for the abscissas of the 3 
consecutive data points. In actual fact the distance between two 
neighboring abscissas is 15°, and the middle ordinate belongs to the 
angle k = 2 = 30°. Hence the angle value of z,, becomes 


6,, = 30° + 0.46185 - 15° = 36°.928 


36 Algebraic Equations Chap. I 


The same calculation performed for the second maximum at 
k = 8 gives the following results. 


Yo = 2.17974, y, = 6.32142, yp = 5.84523 


__ 1 3.66549 


Em = 5 461787 0.39688, Om = 120° + 0.39688 - 15 


= 125°.95 
Ym = 6.32142 + 4 0.39688 - 3.66549 = 6.68511 


The maximum ordinate y„ can be used for closer determination of 
the absolute value r of the root. If the assumption r= 1 were 
correct, y,, would have to be 


$ + 07 + Oo + mae + 031 = 7.0669 
But more generally for arbitrary values of r we obtain 
Ym = $4 + oy + og? + + ort = A(r) (1-17.9) 


It is not difficult to prepare a numerical table of the function 4(r) 
which proceeds in sufficiently close intervals of r, for the range 
between r = 0.8 and 1.2. The proper r value can then be ascertained 
by linear interpolation. This table is given as Table I of the Appendix. 
With the help of the tabular values we find that the y,, of the first 
maximum is associated with 


r = 0.9802 
while the y,, of the second maximum is associated with 
r = 0.9864 


We go back to the original variable xz, which according to (1) 
requires multiplication by rọ = 1.3. We thus obtain the first pair of 
complex roots of the equation (15.4) in the form 


x = 0.9802 - 1.3 (cos 36°.93 + i sin 36°.93) = 1.018 + 0.765 


which compares favorably with the correct root 1 + 0.75i, the error 
being only 1%. . , 
The second pair of complex roots comes out as follows: 


x = 0.9864 - 1.3 (cos 125°.95 + isin 125°.95) = —0.753 + 1.038i 


while the correct value is —0.75 + i. The error here is 3%. 


§ 18 Transformation by Reciprocal Radii 37 


18. Transformation by reciprocal radii. We will now consider a 
second and numerically even quicker procedure for locating the 
roots along the maximum circle. We start out exactly as before by 
estimating the radius of the maximum circle according to the peak 
value method of § 16. Then we perform the transformation (17.1). 
This transformation is now applied to the algebraic equation itself 
and not to the power sums S,,. In fact, the power sums will not be 
used beyond the determination of ro. 

In § 16 we dealt with the sixtic equation (16.4) and found rọ = 1.29, 
which could be rounded off to rọ = 1.3. Dividing the successive 
coefficients by the successive powers of 1.3, we obtain the new equation 


u — 0.38461u> + 0.22189u* — 0.414294? 
+ 0.86575u? — 0.05260u + 0.12645 = 0 (1-18.1) 


and we know in advance that at least one pair of complex roots will 
lie near the unit circle | u | = 1. 

At this point we introduce a remarkable transformation, widely 
used in the theory of analytical functions. The simple reciprocal 
transformation 


T (1-18.2) 


considered in the complex plane, has remarkable properties. It gives 
a “conformal mapping” of the plane on itself, transforming the 
outside of the unit circle to the inside, and vice versa. The outstanding 
property of this transformation is that all circles remain circles, 
although with modified radius. Since straight lines can be conceived 
as circles with an infinite radius, the straight lines are also transformed 
into circles. 

For our purposes a shifting of the origin of our variables will be 
of advantage. We consider the transformation 


Be S PEN Sm 
= lẹ+v'’  l+u 


u (1-18.3) 
If the complex variable u moves along the unit circle, the corre- 
sponding variable v moves along the imaginary axis. Hence a root 
along the unit circle is transformed into a purely imaginary root. 
The transformation (3) can easily be accomplished if f(u) is a 
polynomial in u of the order n. We then get, apart from the factor 


38 Algebraic Equations Chap. I 


(1 + v)” which is of no interest for our present purposes, a new 
polynomial in v. The coefficients of this polynomial are in a linear 
relation to the original coefficients a, and are obtainable by multiply- 
ing the a, by a definite matrix B„ of n + 1 rows and n + 1 columns. 


——<——_——. 
aT” a 


This matrix can be pretabulated for every n (cf. Table II of 
the Appendix); it is composed of integers. For example for n = 1, 2, 
and 3 we get 


—1 1 1-1 1 =i 1-1 
ge |- o 2 3 —1 —1 
1 141 —3 -1 1 

1 1 1 


mam () WO = 


§ 18 Transformation by Reciprocal Radii 39 


These matrices can easily be generated in succession, because of the 
recurrence relation 


bgt) — BY), — Be) (1-18.4) 


For example, the element (21) (1.e., second row, first column) of the 
third matrix is obtainable by taking the element (11) (first row, first 
column) of the second matrix and subtracting the element (21) 
(second row, first column): 1 — (—2) = 3. The last column to be 
added always repeats the first column, but with uniformly positive 
signs. 

An interesting property of the B matrices is that their square is 
always proportional to the unit matrix: 


B? =2"-J (1-18.5) 


This means that multiplying any row of the B matrix by any column 
of the same matrix gives 2” in the diagonal and 0 everywhere else. 
For example, in the third matrix the product of row 2 and column 2 
gives 


3+1+4+1+3=8 
while the product of row 2 and column 3 gives 
—3+1—1+3=0 
We multiply the coefficients of our numerical problem (1) 
1, —0.38461, 0.22189, —0.41429, 0.96575, —0.05260, 0.12645 


by the matrix n = 6 of Table II. The multiplication occurs in row by 
row fashion. We thus get a transformed set of coefficients which 
belong to a polynomial of sixth order in v 


3.063791 — 5.28162v® + 16.75769v4* — 20.046441? 
+ 14.86053v? — 2.62554v + 1.36439 = 0 (1-18.6) 


(Numerical check: sum of new coefficients = 2” times old absolute 
term. This check is absolute, since the multiplications by integers do 
not introduce any rounding errors. In our example: È a; = 8.09280 
= 64 - 0.12645). 


40 Algebraic Equations Chap. I 


19. Roots near the imaginary axis. The root which was near the 
unit circle in the variable u is now near the imaginary axis in the new 
variable v. But if the root of an algebraic equation with real 
coefficients is near the imaginary axis, such a root can easily be 
located. Assuming first that the root lies exactly on the imaginary 
axis, the given equation splits into two equations, since the even 
powers alone give one equation and the odd powers alone give 
another equation. We will put 


| (1-19.1) 


and obtain for A two algebraic equations whose order is not more than 
one-half of the previous order. Moreover, in the new problem we 
search for real roots only. Our task is thus greatly simplified. 

The imaginary part of the equation provides a linear equation in 
the case of a quartic, and even in the case of a sixtic, the equation 
in À is still only quadratic and can be solved by the standard formula. 
Only positive roots are of interest to us. 

The imaginary part (odd powers) of the numerical example (1-18.6) 
gives for A, 


5.281624? — 20.04644A + 2.62554 = 0 


This equation has two positive roots: 


A, = 0.13583, A, = 3.65967 


Which of these two roots will be of interest to us cannot be told in 
advance. We have to substitute this A in the real part of the equation 
and see whether the remainder is small or not. In our case both A, 
give small remainders, thus indicating that two separate pairs of 
complex roots are near the imaginary axis. 

An added advantage of this method is that the correction of the 
preliminary root by Newton’s scheme (cf. § 5) becomes greatly 
simplified. The “movable strip” used in evaluating f (£o), f‘(z») and 
4f" (£o) is now composed of the numbers 


—A, 0, 1 


in vertical arrangement. In view of the middle zero, the number of 
multiplications is halved, and the calculating load considerably 
reduced. 


§ 19 Roots Near the Imaginary Axis 41 


We display the substitution scheme for the case of the smaller 
root A). 


—0.13583 Quot. fr) Quot. f(r) 
0 
1 3.06379 3.06379 
—5.28162 —5.28162 18.38274 18.38274 
16.34153 16.75769 — 26.40810 —26.40810 
—19.32904 — 20.04644 64.40810 67.03075 
12.64086 14.86053 Remainder — 56.55231 — 60.13932 Remainder 
0 — 2.62554 | — 0.00008 0 29.72106|20.95543 
0 1.36439 | —0.35262 0 — 2.62554} 5.05596 
if”(v) vo =V 0.13583i = 0.36855i 
45.95685 45.95685 f (vo) = —0.3526 
— 52.81620 — 52.81620 f’ (va) = 7.7231i + 5.0560 
94.30380 100.54612 4f” (vo) = —19.5204i + 2.0512 


0 — 60.13932 | — 52.96530 
0 14.86053 2.05124 
1 5.0560 + 7.72311 4 2.0512 — 19.5204i 
h 0.3526 5.0560 + 7.7231i 
= (14.338 + 21.902i) — (1.719 — 1.344i) = 12.619 + 20.558i 
h = 0.02169 — 0.03533;ś 
v = vo + h = 0.02169 + 0.33322i 
Transforming back to the original variable u we obtain 
l—v 2 2 


io tt iros ~! t Toate + 0.33320 


= 0.76933 — 0.57706i 


u = 


Finally we return to the original x by multiplying by 1.3. 


x = (0.76933 — 0.57706i)1.3 = 1.0001 — 0.7502i 
[correct: 1 — 0.75i] 


The correction scheme for the larger root is quite similar. It is 
advisable, however, to substitute the reciprocal of 1: 


A = 0.27325, iy = V0.27325i = 0.522133 


42 Algebraic Equations Chap. I 
into the inverted polynomial 
1.36439 — 2.625546 + 14.860530? — =- + 3.063796 

We thus obtain, if we perform the calculations, 

h = 0.02448 — 0.022911 

ü = 0.02448 + 0.49982: 
ö— } sts Ses a A 
6+ 1 d+ 1 1.02448 + 0.49982: 

= —0.57688 + 0.76932: 


x = 1.3(—0.57688 + 0.76932i) 
= —0.7499 + 1.0001i 


[correct: —0.75 + i] 


20. Multiple roots. Whenever the remainder f(xọ) is small, 
Newton’s scheme will give a powerful correction, provided, however, 
that f’(x») is not too small at the same time. Special precautions are 
demanded, however, if it so happens that not only f(x) but also 
f’ (£o) is small so that the ratio f(%»)/ f (£o) is not small. In such 
a case, the smallness of f'(x) indicates that we are near to a double 
root, or to two roots very close together. The separation of these 
roots cannot be obtained by a purely linear interpolation. We have 
to evaluate f (xo), f (£o), and f(x) and solve the quadratic equation 


Bf" Chk +S Eh + fE) = 9 (1-20.1) 


by the standard formula. The two roots, if added to 2p, will approxi- 
mate the two roots closely. The x, thus obtained can now be used 
for further refinement, by the regular technique described in § 5. 
Going through the routine independently for both roots, we can 
obtain each root with any accuracy we want, after the first approxima- 
tion brought us so close to each respective root that the damaging 
influence of the other root is eliminated. 

A similar procedure is demanded if it so happens that not only 
f(x) and f’ (£) but even f”(x) is so small that the A calculated from 
(1) is not small. In that case we have to add f(a.) and solve the 
cubic 


$f (ah? + Af" Ek +f Eh + fo) =O (1-20.2) 


§ 21 Algebraic Equations with Complex Coefficients 43 


in order to separate the three roots, which are now very closely 
together. Subsequent application of Newton’s scheme to every single 
root thus obtained will finally correct each root to the desired 
accuracy. 

We see that the separation of closely bunched multiple roots is 
generally a cumbersome task which cannot be accomplished without 
a great deal of computational work. 


21. Algebraic equations with complex coefficients. Let F,(z) be 
a polynomial with complex coefficients. The roots of such a poly- 
nomial are all complex but they do not appear in pairs of the form 
œ + Bi. The method of the power sums is still available, but the 
evaluation of the higher moments is greatly slowed down by the 
complex character of the coefficients. The product of two complex 
numbers requires four multiplications, and thus the time of obtaining 
the radius of the maximum circle is greatly increased. 

It is frequently preferable to avoid the operation with complex 
numbers altogether by multiplying the polynomial F,(z) by its 
complex conjugate F% (x), where the symbol * indicates that every i 
in F(x) is changed to —i. We form the polynomial of the order 2n. 


fana) = F(x) Fy (2) (1-21.1) 


which has real coefficients. This introduces n extraneous roots, 
since every « + if is now complemented by the corresponding 
a — iB. But we have the great advantage that the operation with 
complex coefficients is avoided. We operate solely with the real 
Sen(x) and obtain its n pairs ofconjugate complex roots x, = y+ if w 
Now the decision has to be made whether the sign of the imaginary 
part is plus or minus. We do that by substituting x, into the original 
F(x). The movable strip has still real coefficients. Hence we can 
take the real part of F,(x) alone and the imaginary part of F,(z) 
alone and go through the procedure independently, avoiding any 
complex numbers. The remainders combined appear in the form 


(a + ib)x, + (c + id) (1-21.2) 


where the coefficients a, b, c, d are not small. By putting this 
remainder equal to zero we obtain a root which will be very near 
to either «, + if, or «, — iB,. We thus decide the right sign and get 
at the same time an additional check on the accuracy of the root. 


44 Algebraic Equations Chap. I 


22. Stability analysis. Whenever a mechanical or electrical 
system is in a state of equilibrium, small disturbing forces will tend 
to modify this equilibrium. Since the potential energy in the 
neighborhood of a state of equilibrium is always a positive definite 
form of the displacements, the system will perform small oscillations 
around the state of equilibrium. These oscillations can be conceived 
as a linear superposition of n “normal oscillations.” The frequencies 
of these oscillations are determined by solving the ‘characteristic 
equation” associated with the dynamic system. While frictional forces 
tend to cause damping of these oscillations and thus return the 
system in its state of equilibrium, the situation is different if an 
energy source is present in the system which can counteract the 
effect of frictional forces and even build up the amplitudes to ever- 
increasing amounts. The general problem of stability thus involves 
an algebraic equation whose complex roots may reveal either positive 
or negative damping, depending on the sign of the real parts of the 
roots. The stability of the system demands that all the roots of the 
characteristic equation shall lie in the negative half plane of the 
complex plane. One single root with a positive real part is sufficient 
to throw the system out of gear. 

Under these circumstances it is of interest to decide whether a 
given algebraic equation possesses roots in the positive half plane or 
not, without going through a detailed quantitative analysis of the 
roots. The English physicist, E. J. Routh discovered an elegant 
method for testing an algebraic equation as to its stability, without 
any calculation of the roots.’ However, occasionally we want to go 
further and actually locate the unstable root if the system is found to 
be unstable. For this purpose the transformation by reciprocal radii 
is again of great advantage. Let us replace x by x = —u and again 
transform to the variable v by the transformation (1-18.3). This 
transformation maps the entire left complex half plane R(x) < 0 
into the unit circle. Hence the condition that all the roots x, are in 
the negative half plane is equivalent to the condition that all the 
roots u, are in absolute value smaller than 1. Hence the radius r,, of 
the maximum circle must come out as <1 in order to guarantee the 
stability of the system. Since the method of synthetic division makes 
the power sums and also r,, easily accessible, we obtain a stability 


1 For a description of Routh’s method see {2}, Vol. I, p. 129; also Ref. [3] of 
Chapter II, p. 154. 


§ 22 Stability Analysis 45 


criterion with relatively little labor, together with a localization of 
the unstable root if r„ turns out to be greater than 1. 

In transforming the original equation it is important that the equa- 
tion shall be given in a proper scale. We normalize the scale by 
demanding that the absolute term of the equation shall be nearly 1 
(assuming that the coefficient of x” is 1). 

Example 1. The roots of the previously treated quartic had 
negative real parts, the polynomial is thus of the “‘stable’’ kind. Let 
us now see how we can establish this fact without going through the 
evaluation of the roots. The original equation is 


ot + 76023 + 23.3422 + 38.44x + 37.40=0 (1-22.1) 


We normalize the absolute term and change z to u by the transforma- 
tion 
l 
T (1-22.2) 
which means that the successive coefficients of (1) are multiplied by 
the powers of 0.4, and the sign of the odd powers is changed to the 
opposite. 


ut — 3.04u? + 3.7344u? — 2.4602u + 0.9574 = 0 (1-22.3) 


The new absolute term is nearly 1. Multiplication by the B, matrix 
gives the new equation in v. 


11.192004 — 1.330003 + 4.275612 + 0.9892v + 0.1916 = 0 
(1-22.4) 


The stability of this equation can be established by mere inspection, 
without further computation. Assuming that v is an arbitrary 
complex number, we can conceive the terms of an algebraic equation 
as vectors of the complex plane. The fact that a sum of vectors is zero 
means geometrically that the vector polygon closes, i.e., the end point 
returns to the origin. If now we separate the first vector from the 
rest, we have the proposition that the straight line from A to B is 
complemented by a polygon which starts from B and returns to A. 
We see from the minimum property of a straight line that the total 
length of the path from B back to A can never be less than the path 
from A to B. Therefore, if we find that for a certain v, 


| av" | + | ago” =| +< law" | (1-22.5) 


46 Algebraic Equations Chap. I 


we can be sure that equation (22.4) cannot hold. Now if we divide by 
v” and write ò for 1/v, we see at once that 


11.192 > | 1.3300] | a| + [4.2756] | a)? + |0.9892] |} + 0.1916] 5}4 
(1-22.6) 
for any |ö| < 1, that is, for any |v| > 1. Generally we can say that 
if the coefficient of v” is larger than the absolute sum of all the 
remaining coefficients, the equation in v can have no root outside or 
on the unit circle, which means that the system is stable. This simple 
criterion (which is sufficient but not necessary for stability) is satisfied 
in our case and thus the stability of the equation established. 


Example 2. The following equation is also of the stable type. 


xê + 4.2425 + 8.5024 + 10.2723 + 8.092? + 3.96x + 0.98 = 0 


The absolute term is here already properly normalized, and we can 
immediately proceed to the transformation to v, after changing the 
sign of the odd powers. Multiplication by the matrix B, gives 


37.04v® — 2.0615 + 23.304 + 1.240? + 2.920? + 0.180 + 0.10 = 0 


Once more we see that the coefficient of vê outdistances the sum of the 
remaining coefficients, which shows directly the stability of the 
system. 

Example 3. The previous polynomial becomes unstable if we 
multiply it by the root factor 


x? — 0.62 + 1 
This gives the eighth-order polynomial, 


x8 +. 3.6427 + 6.956 + 9.412° + 10.42824 
+ 9.37627 + 6.69427 + 3.372 + 0.98 


The transformation to the v plane yields 


51.856v8 — 2.884v’ + 128.924v® — 3.620v° + 64.6680* 
+ 3.4760 + 7.7320? + 0.468v + 0.260 = 0 


The large coefficients of v*, vê, v8 allow an immediate localization 


§ 22 Stability Analysis 47 


of the unstable root. Neglecting all the other terms we get the 
following quadratic in v? = —y. 


51.86y? — 128.92y + 64.67 = 0 
which gives y = 1.243 + 0.546 
The smaller root is <1 and thus stable. The unstable root yields 
v= /_1.79 = £1.33 


and going back to the original variable xv, 


l—v 2 
—z = ——— = — — = — 0.2 0.96i 
x EF BEET 8 + 0.96i 
x = 0.28 + 0.96: 


This is a very close estimate of the true value of the unstable root, 
which is x = 0.3 + 0.95i 

Example 4. Conditions are not always so favorable for spotting 
unstable roots. In the example given by Doherty-Keller, {2}, I, 
p. 129) the following sixtic shall be investigated for stability. 


a8 + 68.625 + 78524 + 721323 + 50700x? + 82002 + 435000 = 0 
For an approximate normalization of the absolute term we put 


x=—10u 

and obtain 

uê — 6.868 + 7.85ut — 7.21308 + 5.07u? — 0.082u + 0.435 = 0 
The transformation to v gives 

28.51v® — 36.062v° + 21.676v* — 0.1808 

— 4.466v? + 18.162v + 0.2 = 0 

The previous criterion for stability fails to hold, since the absolute 
sum of the last six coefficients outweigh the first coefficient. Hence 
the stable or unstable nature of the equation is not yet decided, and 


we have to evaluate a few of the power sums. Dividing by the largest 
coefficient and truncating to two decimal places, we obtain 


vê — 1.26v® + 0.764 — 0.16? + 0.640 = 0 


The equation is thus reducible to the fifth order, since the root v = 0 


48 Algebraic Equations Chap. I 


(strictly speaking v = very small) is of no interest. Synthetic 
division by the movable strip method yields the successive power 
sums: 


k= 0 S= 5 
1 1.26 
2 0.0676 
3 —0.87242 
10 3.7941 
11 5,5555 
12 6.0846 


As soon as a power sum larger than 5 is reached, we can stop, 
since this clearly establishes the instability of the system. Indeed, the 
successive powers constantly diminish the absolute value of the roots 
if all the roots lie inside the unit circle. Hence it is impossible that the 
sum v¥ + o% + + + o£ shall ever go beyond n if all the |v, | are 
smaller than 1. If our goal is merely to show the instability of the 
system, this goal is already accomplished by having obtained S,, > 5. 
But if we want to locate the unstable root, we can evaluate a few more 
Sm and obtain the unstable root by solving the quadratic equation 
(15.14). 

This method of deciding the stability of a system will obviously fail 
if the critical r is very nearly 1. But this means that the root in the 
original variable u is very near to the imaginary axis. Since we have 
seen that roots near to the imaginary axis can easily be located (by 
looking for the real roots of a redundant system of two equations), a 
very slightly unstable system can be spotted anyway and needs no 
further consideration. 


Bibliographical References 


[1] Cf. Ref. {11}, Chapter IX; [2] Cf. Ref. {14}, Chapter VI. 


[3] BôcHer, M., Introduction to Higher Algebra (Macmillan, New 
York, 1938). 


Article: 
[4] Fry, Tu. C., “Some Numerical Methods of Locating Roots of 
Polynomials,” Quart. Applied Math., 3, 89 (1945). 


II 


MATRICES 
AND EIGENVALUE 
PROBLEMS 


1. Historical survey. The solution of simultaneous linear algebraic 
equations has a very old history. The Hindus, who are the inventors 
of the decimal system and of the algebraic method, solved simul- 
taneous linear algebraic equations from the sixth century on. In the 
early stages of the development only the unknowns of the problem 
were denoted by symbols, while the given coefficients of the linear 
system came in with their actual numerical values. The complete 
symbolization of algebra came in the time of the great French 
algebraist François Viète (1540-1603). This symbolization made it 
possible to develop general methods for the solution of systems of 
linear equations. The great philosopher and mathematician Leibniz 
invented the notion of a “determinant” and introduced a notation 
which is essentially the same we use today. He showed how the 
unknowns of any consistent linear algebraic system are obtainable by 
forming the ratio of two determinants. 

During the nineteenth century the operational viewpoint of mathe- 
matics came into the focus of interest. In contrast to arithmetic, 
where only the final numerical answer is of importance, the interest of 
algebra centers around the operations involved in the evaluation of 
the results. Our interest is not what the final numerical answer will 
be but by what operations that answer will be obtained. Hence in a 
purely algebraic problem parentheses have to be removed, fractions 
cleared, multiplications performed, etc., in purely symbolic form. We 
do not care what the numerical value will be of the symbols which 
stand for quantities. Our conclusions are based on the fact that all 

49 


50 Matrices and Eigenvalue Problems Chap. II 


numbers satisfy certain general laws, called the “postulates of 
algebra.” This makes it possible for us to draw valid conclusions 
without knowing what the actual numerical parameters of the 
problem will be. We need not carry out the complicated algebraic 
operations with the help of the given numbers but with the help of 
symbols which imitate the behavior of numbers. The actual numbers 
enter only the final formula which tells us what operations are to be 
performed with the given numbers in order to get the answer. 

The fundamental operations of arithmetic—addition and sub- 
traction, multiplication and division—are first performed on simple 
numbers only. We start with the integers and gradually extend the 
realm of numbers by introducing the negative numbers and the 
fractions. In algebra two further fundamental operations are added, 
viz., raising to a power and taking the root. These operations give 
rise to a further enlargement of the number system by the discovery 
of the imaginary and complex numbers of the form a + bi. All 
these numbers satisfy the basic postulates of algebra, and thus the 
algebraic operations with all these numbers are equally satisfactory. 
But in 1859 the English mathematician Cayley greatly extended the 
realm of algebra by showing that a “‘matrix,” although composed of a 
system of quantities, can be conceived as one single algebraic operator 
which satisfies all the postulates of ordinary algebra, with the single 
exception of the commutative law of multiplication. The algebra of 
matrices thus demonstrates the operation of a noncommutative 
algebra. The matrix algebra differs, however, in one further feature 
from ordinary algebra. It satisfies its own characteristic equation 
and thus leads to a polynomial identity which has no analogy in the 
algebra of ordinary real or complex numbers. This phase of the 
theory was developed by Sylvester (1851)—originator of the term 
“latent roots’”—and later by Weierstrass (1868). The most complete 
algebraic theory of the characteristic equation was finally given by 
Frobenius (1879). 

Fredholm (1900) introduced the notion of matrices of infinite 
order and extended the algebraic theory of the characteristic equation 
to the case of infinitely many variables. This became the foundation 
of the theory of orthogonal function systems (cf. V, 16) and the 
geometrical treatment of linear differential and integral operators. 

Fields of application. The characteristic equation with the 
associated eigenvalues and eigenvectors plays a fundamental part 


§ 2 Vectors and Tensors 5S1- 


in the theory of vibrations, whether these vibrations be of a mechanical 
or electric, macroscopic or microscopic kind. The elastic vibrations 
of a bridge or any other solid structure, the flutter vibrations of an 
airplane wing, the transient oscillations of an electric network, the 
wave-mechanical vibrations of atoms and molecules are all examples 
for the operation of the characteristic equation. The buckling of an 
elastic structure is likewise an eigenvalue problem, since buckling 
occurs if the smallest vibrational frequency of the elastic structure 
reaches the value zero. 

In Schroedinger’s wave mechanics the atomic and molecular 
oscillations of particles play a fundamental role in the description of 
the physical and chemical properties of matter. The eigenvalues of 
Schroedinger’s wave equation are directly proportional to the energy 
value of the various quantum states. 

In boundary value problems, expansion of the solution into the 
orthogonal functions associated with the given differential operator 
provides a powerful tool in the problem of constructing the solution 
in terms of the given boundary values. Since the Green’s function 
associated with the given differential operator is frequently not 
available in explicit form, the bilinear expansion gives an indirect 
method of generating the Green’s function. 

The theory of linear differential and integral operators gains 
greatly in clarity and conciseness by introducing the associated 
eigenfunctions as an auxiliary frame of reference. This means in 
geometrical interpretation that the associated quadratic surface is 
transformed to its principal axes. The principal axis transformation 
of quadratic forms becomes thus a fundamental connecting link 
between widely different branches of mathematics. The solution of 
systems of linear algebraic equations, matrix calculus, the general 
theory of linear differential and integral operators, can all be 
conceived as various formulations of the same fundamental 
problem, viz., the principal axis transformation of a quadratic 
surface in a Euclidean space of either finite or infinitely many 
dimensions. 


2. Vectors and tensors. Vectors are different from scalars by 
having magnitude and directions. But vectors are not the only 
quantities which go beyond the realm of scalars. A vector can be 
analyzed in a certain frame of reference and given with the help of its 


52 Matrices and Eigenvalue Problems Chap. H 


components. Thus we may say that a vector a has in three dimensions 
the components 44, ad, a3, and we may write 


a= ai + aj + ask 
but we may also write 
a = (a), a, as) 


which indicates that the three numbers 4@,, a», az, called the ‘““com- 
ponents” of the vector, uniquely characterize the vector a. During 
the development of the natural sciences in the nineteenth century the 
discovery was made that vectors represent only a very special class 
of directed quantities, called ‘“‘tensors.’’ Vectors are the simplest 
class of tensors, namely a set of quantities which can be characterized 
by one single subscript. But generally the number of indices may be 
two, three, or more: dijp 

Of greatest importance among these tensors are the tensors of 
second order. A tensor of second order is characterized by two 
subscripts: a,,. The components of such a tensor can be arranged in 
a two-dimensional scheme, in such a way that the first subscript gives 
the row and the second subscript the column to which a certain 
component belongs. 


4, 2 3 `“ Qn 
dz A% Gog °° Aan 
dni a n2 ang se a mn 


By putting brackets around the scheme we indicate that we want to 
consider the entire assembly of these components as one single 
integrated unit. None of these components have existence in them- 
selves. It is the entire assembly of all the components which con- 
stitutes a tensor, in a similar way as the entire assembly of 
components constitutes a vector. 


3. Matrices as algebraic quantities. Cayley made the fundamental 
discovery that such an array of numbers can be conceived as 
one single algebraic quantity A, with which certain algebraic 
operations can be performed. We can add, subtract, multiply, 
and divide with such an array of numbers, just as if it were a 


§ 3 Matrices as Algebraic Quantities 53 


single algebraic quantity. We will call such an array of numbers a 
“matrix” and denote it by 


du Ag `“ An di di2 ` An 
a a .. A a AXE 
a POS abo A= h.z T Fan 
As) 2 
Any ano nat Ann în Ang ans Ann 


The quantities a,, are called the “elements” of the matrix A. In 
contrast to the “determinant” 


a4, Ai Ain 
a a eee a 

[Aes lire (2-3.1) 
Ani ang i Ann 


we do not think of a matrix A as a single number, obtained by a 
certain process out of the elements a,;, of the matrix. The matrix A 
refers merely to the entire assembly of the numbers a,,, but arranged 
in a very definite manner in rows and columns. Every element a;, 
has its definite place in the array and cannot be replaced by other 
elements. Hence a matrix is defined as an assembly of numbers, 
arrayed in a strictly prescribed geometric pattern. _ 

The fact that we can operate with a matrix algebraically means 
that the fundamental operations of algebra, viz., addition, sub- 
traction, multiplication, and division, raising to a power, and taking 
the root can be extended to the realm of matrices. 

The two fundamental operations from which everything else is 
derivable are addition and multiplication. Subtraction is merely the 
inverse of addition, division merely the inverse of multiplication. 
Hence it is enough to know how to add and how to multiply 
matrices. 

The addition of matrices is a simple operation. Given the two 
matrices 


54 Matrices and Eigenvalue Problems Chap. II 


we can add corresponding elements and obtain the new matrix 


an + by Ae td. - Ain Ebin 
Ga i + bay ae + beg + Fi + Bon (2-3.3) 


Am za bn ang FT Dag e Ann F ban 
We define this new matrix C as the sum of the matrices A and B. 
C=A+B (2-3.4) 


More complicated is the operation of multiplication. The clue to this 
operation is given to us by the fact that matrices appear primarily in 
connection with linear equations. A general system of simultaneous 
linear algebraic equations can be written in the following systematic 
manner. 

aty T ati + aoùt + `o Antr = dy 

Axtı F Ato + Azta + `“ Aznčn = bz (2-3.5) 


a ni ae AnorXe a ad n3T3 z ik Annin = bn 


This systematic way of writing a system of algebraic equations was 
absolutely essential for the development of matrix calculus. In 
earlier days the elements of a matrix were denoted by different 


letters: a, b, c,-*-, without the use of subscripts. Similarly the 
unknowns of the problem fell apart into a system of disconnected 
quantities x, y, z, ©: . The ingenious symbolism of the subscripts in 


both matrix elements and unknowns was absolutely essential for 
development of the theory of matrices, because it made it evident 
that the elements of a matrix are in reality the components of a 
single tensor of second order, while the unknowns of a set of linear 
equations are to be conceived as the components of one single 
vector x. Similarly the given right side of the equations has to be 
conceived as the components of another vector b. Accordingly we 
will write the given set of equations in the form 


Az = b (2-3.6) 


and conceive the left side as the product of the matrix A and the 
vector x. Hence we know already what the operation “multi- 
plication” means if applied to a matrix and a vector. 


§ 3 Matrices as Algebraic Quantities 55 


This equation brings out an important feature of matrices. Let us 
write the previous equation in reversed sequence: 


b= Ax (2-3.7) 


Considering x as a given vector, we can say that the matrix A 
associates with a given vector x a new vector b. We can also say that 
the matrix A “transforms” the vector x into a new vector b. For 
example, we may rotate the vector x around a certain axis, by a 
certain angle. Such a rotation would be an 
example of multiplication by a certain matrix. 
But the matrix associated with a rigid rotation in 
space is a very special matrix. A general 
matrix transforms the vector x into a new 
position b, but this transformation cannot be 
pictured as a mere rotation. 

Let us now start with a vector u and transform it into the vector v, 
with the help of the matrix A. Then we continue the process and 
transform v into a new vector w, with the help of another matrix B 


v= Au, w= Bu (2-3.8) 


Now w was generated out of v, but v itself was generated out of u. 
Hence we can say that w was generated out of u, leaving out the 
intermediate vector v. If we substitute for v in the previous equation, 


we obtain 
w= BAu (2-3.9) 


Hence if we consider the direct transition from u to w by writing 

w = Cu (2-3.10) 
we see that the matrix C must be conceived as 

C= BA (2-3.11) 


This gives the rule by which the product of two matrices B and A is 
obtained. If we perform the substitution, we find that an arbitrary 
element c,, of the product matrix C must be constructed as follows. 
Select the ith row of the first factor B and the kth column of the 
second factor A. Multiply these two together. “Multiplication” 


56. Matrices and Eigenvalue Problems Chap. Il 


here means to form the product 
i of corresponding elements and 
| take the sum. For example, 


%4= 3u, — 2u, m= ntis 
(B) (A) 
Vo = —Uu F 4i, Wa = 20, — Sve 
Then 
Wi = 3u, — 2u, — u, + 4u, = 2u, + 2u, 
Wo = 6u, coo 4u, + Duy = 20u, == BUA P 24u, 
On the other hand, 


Poi 2 Teee 
2-3 +(—5): (—1)= 11 2+(—2) +(—5)-4= —24 


If the very same transformations are applied in the opposite 
sequence, we do not get the same result. 


vi = uy + Uo, Wy == 30, 2: 205 
Uo = 2u Ega Suz, Wo = —v\, + Av, 
W, = 3u, + 3u, — 4u, + 10u, = —u, + 13u 


Wo = — ui Te Us + Su a 20u = Tu = 21u 


3-1(-2)-2 e= J5 
Sree = 7 —1:1(=4)-5 = 01 


This shows that the products AB and BA are not the same. The 
ordinary commutative law of multiplication does not hold in the 
case of matrices. 


§ 4 Eigenvalue Analysis 57 


On the other hand, the associative law of multiplication is 
satisfied. 


A(BC) = (AB)C (2-3.12) 
Indeed, let us transform u to v to w to z by the following operations. 


v= Cu, w= Bv, z= Aw 
Then 
z= (ABC)u 


But this transformation could have been obtained by going from u 
directly to w and then to z, 


ABC = A(BC) 
or by going from u to v and from v directly to z, 
ABC = (AB)C 
Our customary algebra is based on six fundamental postulates: 


1. Commutative law of addition: a + b = b +a 
2. Associative law of addition: (a + b) + c=a + (b +c) 
. Commutative law of multiplication: ab = ba 


Aa U 


. Associative law of multiplication: (ab)c = a(bc) 

5. Distributive law of multiplication: (a +b) c= ac + bc, 
c(a + b) = ca + cb 

6. The nonfactorability of zero: If ab = 0, then a= 0, or 

b = 0, or a = b = Q. 


In matrix algebra the postulates 1, 2, 4, and 5 still hold, but the 
postulates 3 and 6 are violated. The postulate 3 does not hold, 
because in defining the product of two matrices the first and the 
second factor do not enter symmetrically, since the rows of the first 
factor are combined with the columns of the second factor. 

Hence, generally 
AB + BA 


4. Eigenvalue analysis. If we have one single matrix A alone, 
without any second matrix B, the noncommutative property of 
matrix multiplication is irrelevant and we would think that the 


58 Matrices and Eigenvalue Problems Chap. II 


algebra of the matrix A is equivalent to the algebra of any ordinary 
algebraic variable x. Yet this is not the case because, although at 
present all the first five postulates of algebra are fulfilled, the 
nonfulfillment of the sixth postulate causes a profound difference. 

This postulate is closely related to a remarkable equation in 
matrix algebra which has no analogy in the realm of ordinary 
algebraic numbers. We have said that multiplication of a vector x 
by the matrix A generates a new vector b which can be conceived as a 
transformation of the original vector x. We will now ask the question 
whether or not it may happen that the new vector b has the same 
direction as the original vector x. In this case b is s simply proportional 
to x and we obtain the condition 


Ax = Ax (2-4.1) 
or written out in components, 


anty + hita H of Unin = AD, 
az% + Ago%o ee Anin = Axe (2-4.2) 


amti F anto + o Fannin = AT, 
The right side is not truly a “right side” in the sense of a given 
vector, and it is more logical to bring the right side over to the left 
side and write the entire equation as a homogeneous set of equations, 
without any right side. 


(an — A, + at To + ntn = 0 
a271 T (a22 R A)za ae = I Aann = 0 (2-4.3) 


am% T a noT si Ei T (ann — A)X n =0 


The matrix of this system of equations is still the original matrix A, 
but after subtracting A in all the diagonal terms. 

Now we know that n homogeneous linear equations in n unknowns 
have no solution (outside of the vanishing of all the z,, which means 
that the vector x does not exist), unless one very definite condition is 
fulfilled, viz., that the determinant of the system is zero. 


a,—A di2 e Ain 


n ah o am |O (244) 


§ 4 Eigenvalue Analysis 59 


Now the actual expansion of a determinant according to the 
original definition is a very cumbersome task if n is larger 
than 4. There are other less direct but numerically simpler 
methods for the determination of A. However, for theoretical 
purposes the existence of the determinant condition (4) is of 
greatest importance. The technique of expanding a determinant 
of the order n shows that on the left side a polynomial of the order 
n in A appears. It will be convenient to multiply the determinant 
by (—1)”. Then the largest power of the polynomial appears in 
the form A”. 


Q,—A ag e Ain 

_1)n | 421 A22 — o Aon 

CD") ; (2-4.5) 
ani an2 dag A 


= À” F CnaT H Crh O HH H 


For example, let us choose n = 3 and expand the determinant (5). 


2—A 0 3 
—| 1 —1—A 5 = J3 + 2 — 24) + 24 
0 4 —2 — À 


The determinant (5), if evaluated as a polynomial of A, is called 
the “characteristic polynomial” of A, and if we set this polynomial 
equal to zero, we get the “‘characteristic equation” : 


A® +c, ,A" H €, .A™* +--+) = 0 (2-4.6) 


The condition (4) shows that A has to fulfill this equation, i.e., A 
has to be one of the roots of the algebraic equation (6). 

We know that an algebraic equation of the order n has always n 
and only n generally complex roots. Some of the roots may collapse 
into one, but then they count as multiple roots. Hence we can say 
that there are definitely n and only n å values, called the “‘character- 
istic values” or “eigenvalues” of the matrix A, for which the 
equation (1) is solvable. 


A= A; Ag, As, Gi Àn (2-4.7) 


60 Matrices and Eigenvalue Problems Chap. II 


To every possible A = A, a solution of the homogeneous set (1) can 
be found. We tabulate these solutions as follows: 


=A: g=, gD, e, 2 D 
A= h: =n, aP, e, aP (2-4.8) 
A = 7 x = w, ga an L, ™ 


The actual construction of these solutions is generally, if n goes 
beyond 4, a very cumbersome task. For the general understanding 
of the nature of matrices, however, it is enough to know that these 
solutions exist and are actually obtainable by solving a linear set of 
equations. 

The solutions of the table (8) represent n distinct vectors of the 
n-dimensional space. In some exceptional cases some (or all) of these 
vectors may collapse into one, but then we conceive such cases as 
limits of the regular cases. These solutions are called the 
“eigenvectors” or “principal axes” of the matrix A. We will denote 
them by uy, u, °°, Up 


uy = (ay, a$, +, a”) 
ty = (x), x2), +, a6) 
(2-4.9) 


u, = (a, af), rs ai”) 


The entire eigenvector analysis of the matrix A can thus be sum- 
marized as follows: 


the n eigenvalues: A= 24, Ag, Ag, °° 5 An 
the n associated eigenvectors: Uy, Ug, Us, "p Uy 


Since the eigenvectors are solutions of a homogeneous set of equa- 
tions, the solution is determined only up to a universal factor. Every 
one of the u vectors can be multiplied by an arbitrary factor and still 
remain an eigenvector. The eigenvectors are thus uniquely 
determined in their directions only, but their length (absolute value) 
is arbitrary. 


5. The Hamilton-Cayley equation. Let us consider the solution 
of the linear vector equation, 


(A — Ax = 0 


§ 5 The Hamilton-Cayley Equation 61 


This equation defines the first principal axis u, and thus the general 
solution of the equation is x = æu where a, is arbitrary. We now 
consider the solution of another linear vector equation which is 


quadratic in A. 
(A — AA — A.)x = 0 


This equation will be satisfied by an arbitrary linear combination of 
the first two eigenvectors. 


X= Oly F Agua 
Similarly the equation 
(A — A)(A — Ag)(A — 23) = 0 
will be satisfied by an arbitrary linear combination of the first three 


eigenvectors. 
x = Oy Uy + AkpUlo + Ags 


Finally the full equation which contains all the root factors: 
(A — ANA — A)(A — Ag) (A — An) = 0 


will be satisfied by an arbitrary linear combination of all the n 
eigenvectors. 
t= XU + oe OY ED + eee X,Uy 


But the n eigenvectors t, uz, ` , U„ include the entire space! and thus 
the last x becomes an arbitrary vector of the n-dimensional space. 
This means that the matrix 


H = (A — AA — AYA — A) (A — ån) (2-5-1) 
operating on an arbitrary vector x, gives zero. 
Hx = 0 (2-5.2) 


This is possible only if the matrix H vanishes identically. Hence we 
find that an arbitrary matrix A satisfies the following polynomial 
identity. 

(A — aA — Ag) + (A — ån) = 0 (2-5.3) 


1 This is not true for “defective” matrices (cf. § 11), but here the theorem is 
establishable by a limit process. 


62 Matrices and Eigenvalue Problems Chap. II 


We have to write this equation somewhat more lucidly, since the 
Ays Aa s An are scalars (pure numbers), while A is a matrix. The 
equation which defines the eigenvectors: Ax = Ax, if written in 
homogeneous form 

(A — A)jx = 0 
should actually mean 

(A — Al)x = 0 


where 7 is the so-called “unit matrix.” This matrix transforms any 
vector in itself since it is the nature of unity that, if used as a multi- 
plier, it does not change anything. If we require that 


lu=u 


shall hold for any arbitrary vector u, it is necessary and sufficient that 
all the diagonal elements of J shall be 1, and all the other elements 0. 


1 0 0 
eyo. * (2-5.4) 
0 0 1 
Hence equation (3) becomes in proper writing, 
(A — AINA — hA) = (A — å, D = 0 (2-5.5) 


We encountered this very same equation earlier in scalar form when 
we were interested in finding the eigenvalues of a matrix. The 
expansion of the determinant (4.5) gave the polynomial 


ar tai H H 6, 


and since the roots of this polynomial were denoted by 4, Ag, © , Ans 
we have by the laws of algebra: 


A” + cA" oe + co = (A — AA — Ag) (A — Ay) 


This shows that by multiplying together the root factors (5) we will 
get exactly the left side of (4.6), i.e., the characteristic polynomial, 
with the only difference that A takes the place of A. 


(A — AD (A —A,1) = A” + GA™ = col 


§ 5 The Hamilton-Cayley Equation 63 


But the characteristic polynomial again is nothing but the deter- 
minant (2-4.5) associated with the characteristic equation. Hence the 
equation (5) may also be written in the form 


Gy—A ay Ain 
(—1)" | ax aa — As on = 0 
; (2-5.6) 
ani ano Ann — A 


and we obtain the remarkable theorem, discovered independently by 
Hamilton and by Cayley, that every matrix satisfies identically its 
own characteristic equation. The characteristic equation, written in 
terms of the scalar A, defines the characteristic values of the matrix, 
but written in terms of the matrix A expresses an algebraic identity. 

Numerical example. In the numerical example of Section 4, the 
matrix A was defined as follows: 


2 0 3 
A=|1 —l 5 
0 4 —2 


The characteristic polynomial appeared in the form 
AS + J? — 24/4 + 24 


If we square and cube the matrix A according to the rules of matrix 
multiplication, we obtain 


4 12 0 20 —12 72 
A® = | 1 21 ~—12], A = | 23 —69 132 
4 —12 24 —4 108 —96 


We now form A® + A? — 244 + 24I: 


20+-4—48+24=0 —12+12—0+0=0  72+0—72+0=0 
23+1—2440=0 —69+214244+24=0 132—12—120+0=0 
—44+4—040=0 108—12—96+0=0 —96-++24-+48-+24=0 


Hence we have demonstrated that 


A’ + A? — 244 + 247=0 


64 Matrices and Eigenvalue Problems Chap. II 


The existence of a polynomial relation of the form 
A” 4- C 4A’? + eee c,A + Col = 0 


distinguishes matrix algebra from ordinary algebra even in the case of 
one single matrix. If x is an ordinary algebraic quantity, we can form 
polynomials of first, second, third, --- order, up to any order, because 
the powers of x are linearly independent of each other. No power is 
ever reducible to a linear combination of lower powers (although in a 
given finite range such a reduction is possible with a high degree of 
accuracy, cf. VII, 9). With a matrix of m rows and n columns the 
situation is different. Since A” is reducible to a linear combination 
of lower powers, the same is true of A”+!, A"+®,--- ; generally of 
Antk (k = 0, 1, 2, = ). Hence any polynomial of A which is of an 
order larger than n — 1 can always be exactly reduced to a poly- 
nomial of not more than (n — 1)st degree. 

Another important difference concerns the process of division. In 
ordinary algebra the quantity 


gla E, 


exists for any value of x except x = 0 since we cannot divide by zero. 
But if the zero has factors, the situation is different. In matrix 
algebra the zero has the factors 


A—A] 
Hence the operation X= 1/X 


if X is a matrix, loses its significance not only if X = 0, i.e., if all the 
elements of the matrix vanish, but also if 


X=A-Ajl 


Here X is not zero and in fact no element of X need be zero. Never- 
theless, the reciprocal of X cannot be formed. 

The reciprocal of A itself is involved in this difficulty if one of the 
eigenvalues of A happens to be A, = 0. The problem of finding the 
reciprocal or “‘inverse” of a matrix is of fundamental importance in 
solving systems of linear equations. We see that this problem has no 


§6 Numerical Example of Complete Eigenvalue Analysis 65 


solution if the matrix A has a zero eigenvalue. In the strict mathe- 
matical sense the inversion problem is impossible only if one of the 
eigenvalues of A is exactly zero. But from the practical standpoint 
we come into great numerical difficulties not only if A has a zero 
eigenvalue, but also if A has one or more very small eigenvalues. The 
mathematical analysis of such “nearly singular systems” deserves 
particularly close attention. In the strict sense, the inversion problem 
of a matrix can be pursued without knowing anything about the 
eigenvalue problem. But in actual fact we cannot understand 
the. peculiar behavior of singular or nearly singular systems if 
we dissociate this problem from the eigenvalue problem of the 
matrix. 

The eigenvalue problem is of profound importance in all flutter and 
vibration phenomena, since the frequency of elastic or electric 
vibrations is determined by the eigenvalues of a certain matrix, while 
the eigenvectors or principal axes of that matrix provide the 
vibrational modes. But even purely static phenomena, such as the 
stability analysis of an airplane, or the problem of buckling, are 
equivalent to an eigenvalue problem. The eigenvalue analysis of 
matrices became thus a leading item in the engineering research of 
our days. 


6. Numerical example of a complete eigenvalue analysis. It will 
be of interest to carry through in an actual numerical example all the 
operations which lead to a complete eigenvalue analysis of a given 
matrix. Hence we will choose a particularly simple matrix of only 
3 rows and columns, in order to reduce the numerical computations 
to a minimum and yet display all the characteristic features of the 
eigenvalues and eigenvectors. 

Let the matrix A be given as follows: 


33 16 7 
A=|—24 —10 —57 (2-6.1) 
29° 24 a 


Our aim will be to obtain all the three eigenvalues and eigenvectors 
associated with this matrix. 

First we construct the characteristic equation by putting —A in the 
diagonal and setting the determinant equal to zero. The determinant 
is obtained by expanding in the elements of the first row. We 


66 Matrices and Eigenvalue Problems Chap. II 


multiply by (—1)* = —1 in order to obtain the characteristic poly- 
nomial with a plus sign in front of A°. 
33—A 16 72 A—33 —16 —72 
(—1)?| —24 —10—A —57 =| —24 —10—A —57 
—8 —4 —17—A —§ —4 —17—A 


= (A — 33)[10 + AX(17 + A) — 228] + 16[24(17 + A) — 456] 
—72[96 — 8(10 + A)] 
= (A — 33)(A2 + 274 — 58) + 16(244 — 48) — 72(—8A + 16) 
= 7842772 — 58A 
— 33/2 — 8914 + 1914 


+ 3844 — 768 
+ 576A — 1152 
B— 642+ 114— 6 
(2-6.2) 
Hence the characteristic equation becomes 
A3 — 6/2 + 114—6= 0 (2-6.3) 


The roots of this equation are 
A=, A4=2, A=3 (2-6.4) 


We will also check the Hamilton-Cayley identity which agrees with 
the characteristic equation, but with A taking the place of A. 


A’ — 64 + 114 — 6J= 0 (2-6.5) 
which gives 
AS = 64? — 114 + 6I (2-6.6) 


In order to check this equation, we have to form the square and the 
cube of the original matrix. We know that in multiplying two mat- 
rices together we have to combine the rows of the first matrix with 
the columns of the second matrix. To keep track of the corresponding 
elements is not easy under these conditions, and errors can easily be 
made. It is better to “transpose” the first matrix by changing rows to 
columns. Then we have to multiply columns by columns which is 
much less confusing and avoids the misplacing of elements. 


§6 Numerical Example of Complete Eigenvalue Analysis 67 


The transposition (i.e., exchange of rows and columns of a matrix) 
is denoted by A. In our example 


33 —24 —8 
A=! 16 —10 —4 
72 —57 —17 


Hence we will obtain A? by multiplying 4 and A in column-by- 
column fashion. In order to indicate that we do not mean ordinary 
multiplication but column by column multiplication, we will use the 
symbol o. 


33 —24 —8 33 16 72 129 80 240 
A? = | 16 —10 —4 |o|—24 —10 —57 | = | —96 —56 —189 
| 72 —57 —17 —8 —4-17 —32 —20 —59 

We repeat the process once more and obtain £. 
129 —96 —32 33 16 72 417 304 648 
A=| 80 —56 —20 |o| —24 —10 —57 |=| —312 —220 —507 
240 —189 —59| |— 8 —4 —17 —104 —76 —161_ 


We now form the right side of equation (6). 


64? — 114 + 6I = 
6-129 — 11:33 + 6 = 417 6-80 — 11-16 = 304 
—6:96 + 11-24 = —312 —6-56 + 11-10 + 6 = —220 
—6-32 + 11-8 = —104 —6-20 + 11:4 = — 76 


6:240 — 11:72 = 648 
—6:189 + 11:57 = —507 
—6°59 + 11-17 + 6 = -s07 
The elements thus obtained agree with the elements of 4°, and thus 
the Hamilton-Cayley equation is demonstrated. 

We now come to determination of the eigenvectors or principal 
axes of our matrix. This means the solution of the homogeneous 
linear equations 

(A —ADzx=0 (2-6.7) 


The process has to be repeated for every A;, since every A, is associated 
with a definite principal axis. First we choose A= A; = 1. We 


68 Matrices and Eigenvalue Problems Chap. II 


subtract 1 from the diagonal elements of the matrix and obtain the 
following homogeneous linear equations. 


— rı — 42, — 182, = 0 
We know in advance that the determinant of this linear system is 
zero, since it was exactly this condition which led to the deter- 
mination of the eigenvalues. Now the vanishing of the determinant of 
a homogeneous linear set of equations has a very definite significance. 
It means that these n equations are not independent of each other, 
but the last equation is a consequence of the previous equations. 


Hence we can omit the third equation of the set (8) and it suffices to 
solve the remaining two (generally n — 1) equations. 

32x, + 16x, + 722, = 0 —24xr, — llr — 57x, = 0 (2-6.9) 
If these two equations are satisfied, the last one is automatically 
satisfied. 

But then the difficulty is that we have to obtain 3 unknowns from 
only 2 equations. On the other hand, we know from the homo- 
geneous nature of the eigenvalue problem that the length of the 
principal axes must remain undetermined. This leaves a universal 
factor « undetermined. If x1, £2, x3 is some solution of our problem, 
then «xı, ax, ax, (with any arbitrary «) is an equally valid solution. 
But then we can take advantage of the arbitrariness of « for normal- 


izing x in any arbitrary fashion. For example we may choose 
x = 1. Then equations (9) become 


and this can be written in the inhomogeneous form: 
324, + 16r, = —72, —-242, — lla, = 57 = (2-6.10) 


The original set of n homogeneous equations in n unknowns is thus 
replaceable by an inhomogeneous set of n — 1 equations in n — 1 
unknowns. 

These equations, unless inconsistent or redundant, are now 
solvable by determinants if n does not go beyond 3 or 4, or by matrix 
inversion if is larger. In our simple example, the solution is 
directly obtainable by the simple formula of solving two simultaneous 
linear equations. 


§6 Numerical Example of Complete Eigenvalue Analysis 69 
atı + ait = by (2-6.11) 
doty + azta = bz 

= biaa — bode 
Q11490 — A21đ12 
EA ab — axb 
A1122 — Ag, Qo 
provided 1112 Po Qo14)0 fe 0. 


We include the process for all three values A = 1, 2, 3 in the follow- 
ing table. 


A ==] A= 2 A=3 


32”, + 16r, = —72 | 31a, + 16r, = —72 | 300, + 16x, = —72 


—24a, — ilz, = 57 | 24r, — 12r, = 57 | 24x, — 132, = 57 


We will now tabulate our results as follows. We have obtained 3 
eigenvalues A, and 3 associated vectors which we will call 1, ta, us. 
We write the components of these vectors in 3 separate columns. 
Taking advantage of the arbitrariness of the lengths, we can multiply 
each of these vectors by any constant. Hence we can eliminate 
fractions, and the vector (—15/4, 3, 1) may be replaced by 
(—15, 12, 4). Our table then looks as follows: 


A=1 A= 2 A= 3 


uy Us Us 
—15 —16 —4 (2-6.12) 
12 13 3 


70 Matrices and Eigenvalue Problems Chap. II 


While this solution came about very easily in our simple example, 
we can imagine that in the case of matrices of high order the pro- 
cedure is much more difficult, and methods have to be designed by 
which the eigenvalues and eigenvectors of such matrices become 
numerically accessible. We will discuss such methods later in greater 
detail. 

We would think that our eigenvalue analysis is now complete. 
Yet this is not the case. There is still one feature of matrix algebra 
which we have not considered up to now and which is of funda- 
mental importance. Any given matrix A is automatically associated 
with another matrix which is inseparably attached to it. This is the 
“transposed matrix” or briefly “transpose” of A which we will call A. 

The arrangement of the matrix components into rows and 
columns is somewhat arbitrary. We have a square in front of us, 
but what is “row” and what is “column” depends on how we 
look at this square. A square displays complete right-left and 
up-down symmetry. 

We look at our square in the normal way. 
ROWS We designate certain elements as being in a 
“row,” certain others as being in a “column.” But 
turning the square 90° and looking at it again, 
we find that the previous rows turn to columns 
and the columns to rows. We thus have a duality 
associated with every matrix. The matrix A is 
inevitably associated with its transpose A, and we cannot operate 
with A without simultaneously operating also with the transposed 
matrix A. In ordinary algebra we have a somewhat analogous 
phenomenon in the field of complex numbers. The complex number 
a + bi, is inevitably associated with the “conjugate complex” 
number a — bi, and both appear simultaneously in many algebraic 
problems. For example, in solving an algebraic equation with real 
coefficients, the roots a + bi and a — bi always appear together, and 
one is inseparable from the other. The operation of changing A into 
A thus corresponds in ordinary algebra to the operation of changing 
i to —i. 

Our eigenvalue analysis is thus incomplete if we do not extend it to 
the transposed matrix A. Here the defining equation of the 
eigenvalue problem becomes 

Ax = Àx (2-6.13) 


azzcroo 


§6 Numerical Example of Complete Eigenvalue Analysis 71 


In principle we could think that this problem is completely separated 
from the previous one. However, as fai as the eigenvalues å, are 
concerned, the two problems coincide. We have seen that the 
eigenvalues A of a matrix satisfy a determinant condition. We know 
that a determinant does not change its value if rows and columns are 
interchanged. Hence the characteristic polynomial associated with A 
is exactly the same as that associated with A. Consequently the 
eigenvalues of A are identical with the eigenvalues of A. But each 
eigenvalue has a dual aspect inasmuch as we can use the same A for 
solving the equation Az = Aw and the equation Ay = Ay. In the 
first case we determine the principal axes of A, in the second case the 
principal axes of A. 

Hence we will go through our previous procedure once more, but 
now using the transposed matrix A instead of A. 


_ [33 —24 —8 
A=|16 —10 —4 
72 —57 —17 


We need not repeat the solution of the characteristic equation since 
it remains unchanged. Once more we obtain the 3 characteristic 
values A = 1, 2, 3. However, the table of the equations which lead 
to the principal axes will now appear as follows: 


A=3 


31x, — 24%, = 8 | 302, — 242, = 8 


16x, — 13x, = 4 


Once more we tabulate our results, denoting the eigenvectors of 
the transposed matrix A by v4, ve, v3. Once more we can avoid the 


72 Matrices and Eigenvalue Problems Chap. II 


fractions by multiplying by a suitable factor, since the length of the 
principal axes is undetermined. 


vy | Ve | U3 
| 0 4 (2-6.14) 
0 —1 4 
4 3 3 


The tables (12) and (14) together represent the complete eigenvalue 
analysis of our problem. It consists of n scalars, viz., the eigenvalues 


ne Ap Ao oe, An 


and 2n vectors, viz., the m eigenvectors or principal axes of the 
matrix A: 


Uy, u 29 eee 3 u n 
and the n eigenvectors or principal axes of the matrix 4: 
Uis Va, eee 3 v n 


In our numerical example the complete eigenvalue analysis of the 
given matrix A can be tabulated as follows: 


Complete eigenvalue analysis 


i= 1|4= 3 4=3 4=1]A=2/4=3 

AS | 16 | eed 1 0 | 4 (26.15) 
2/ 13| 3 o | —-1] 4 
4 4 1 4 3 3 


These two sets of vectors are in a remarkable reciprocity relation to 
each other. The u vectors in themselves do not reveal any particular 
inner relations, nor do the v vectors in themselves. But let us form 


§6 Numerical Example of Complete Eigenvalue Analysis 73 


the dot product of one u and one v vector. For example, pairing u 
with all the v vectors, we obtain 


uv, = —15-1412:0+4-4=1 
uv, = —15-0—12-1+4-3=0 
uyv = —15-4+12:44+4-3=0 
Similarly pairing u, with all the v vectors, we obtain 
uzv, = —16-14+ 13-0+4-4=0 
Ug, = —16-0— 13-14+4-3= —1 
uzv = —16-4 + 13:44+4-3=0 
Finally, pairing u with all the v vectors, we obtain 
uzv = —4:1+3-0+1-4=0 
uzv = —4:0 — 3-1 +1-3=0 
uzv = —4:-4 + 3-4 +1:-3= —1 


If the dot product of two vectors comes out as zero, this means, in the 
language of geometry, that these two vectors are perpendicular or 
orthogonal to each other. We see that any vector of the u set is 
orthogonal to any vector of the v set, except its own pair. 
U; Ug = 0 (i ma k) (2-6.16) 
The dot product u,v, came out as 1, the dot product u,-v, as —1, 
and the dot product u,-v3 as —1. We can see without difficulty that 
these dot products could have come out as anything we like, since 
the lengths of the principal axes remained undetermined. We now 
obtain a practical normalization of the free lengths of the principal 
axes by demanding that the dot products of u,-v, shall all become 1. 
uv = 1 (2-6.17) 
This leaves the lengths of u, still undetermined, but the lengths of 
the vectors v, can always be adjusted in such a way that the con- 
dition (17) shall be satisfied (leaving apart the singular case, which 
occurs only in the case of “defective matrices,” cf. § 11, that a certain 
u,v; may come out as 0). If originally a certain u,v, gives c;, we 
change v; to 
l 
Ü; = v; (2-6.18) 


and then in the new 0, system the condition (17) will already hold. 


74 Matrices and Eigenvalue Problems Chap. II 


In our numerical scheme the vector v, is already properly 
normalized since u,-v, was accidentally 1. The vectors v, and v, have 
to be divided by —1, and thus the final v vector scheme becomes 


Vy | Vo | V3 

i 0 —4 

0 1 —4 (2-6.19) 
4 —3 —3 


A dual set of vectors in which any vector of the one set is orthogonal 
to all vectors of the other set, except its own pair, is called a “biortho- 
gonal” set. If in addition the dot product of every vector with its own 
pair is normalized to 1, we speak of a biorthogonal and normalized 
set of vectors. | 

We will now omit the vertical dividing lines which separate the 
vectors u; from each other and the vectors v; from each other. Then 
we obtain n columns of elements, n in each column, which together 
form an n by n matrix. We thus obtain one matrix U which includes 
all the vectors u,, and one matrix V which includes all the vectors v;. 
Hence the results of a complete eigenvalue analysis can be stated in 
still different fashion by giving the n eigenvalues å}, A,,---, A,, and 
the two n by n matrices U and V. In our example, 


A= 1, 2,3 


—15 —16 — 1 oF 
U=| 12 13 31, V=|0 1 —4 | (2-6.20) 
4 4 1 A ie 


The inner relations between these two matrices can be expressed in 
the form of a single matrix equation. 


UV=I1 (2-6.21) 


Indeed, to multiply two matrices together means to form the dot 
products of rows and columns. The rows of the first matrix are 
multiplied by the columns of the second matrix. But the trans- 
position of U has the effect that the rows of U are actually the 
columns of U. And thus on the left side of (21) we have the column- 
by-column products of the matrices U and V. The fact that these 


§7 Algebraic Treatment of Orthogonal Eigenvectors 75 


products come out as 1 for corresponding pairs and 0 for all other 
combinations means that the diagonal elements of the product matrix 
become 1 and all other elements 0. But this is exactly the definition 
of the unit matrix /. An alternative way of stating (21) is the 
equation 

VU=I1 (2-6.22) 


7. Algebraic treatment of the orthogonality of eigenvectors. What 
we have demonstrated purely numerically can be corroborated quite 
generally by the operations of matrix algebra. For this purpose we 
first reformulate the general definition of an eigenvalue problem. 


Ax = dx (2-7.1) 


This equation defines only one principal axis at a time, because À is 
here chosen as one of the eigenvalues, and for each A= å, the 
equation has to be repeated. The equation (1) has thus to be used 
n times, for A= A, A, °°, Àp 

We will now include all the eigenvectors of A in a single matrix 
equation. For this purpose we introduce a matrix A, defined as 
follows: 


no 0 
AS ee (2-7.2) 
00 4, 


A matrix of this form, which has only diagonal elements—all the 
other eleménts being zero—is called a “diagonal matrix.” The 
operation with such a matrix is particularly simple. Let us use A as 
a first factor (called “‘premultiplication”). By the general rule of 
matrix multiplication we get 


Avy, Ayn A 
AA == Aatay Ashes Aefan (2-7. 3) 
Anan Anan Bi Anann 


We see that the premultiplication by A has the effect that the 
successive rows of A are multiplied by 4, 22, = , An. 


76 Matrices and Eigenvalue Problems Chap. II 


Let us now multiply by A as a second factor (called “‘post- 
multiplication”). 


Aan Aolo A Ay g 
AA = hian Aoao A Gan (2-7.4) 
Aian Aa ng gi Anann 


Hence the postmultiplication by A has the effect that the successive 
columns of A are multiplied by 44, Aa, Ap 
We now write the matrix equation 


AU= UA (2-7.5) 


If we write out this equation in components, we find that the first 
column of the resultant equation scheme defines the first principal 
axis u, (i.e., the first column of the matrix U), the second column 
defines the second principal axis u,, and so on. The entire principal 
axis problem is now included in a single matrix equation. 

The principal axis problem of the transposed matrix A is similarly 
included in the matrix equation 


AV=VA (2-7.6) 
which defines all the “adjoint” axes v, Va, v,. They are the 
successive columns of the matrix V. 


Now the following fundamental rule of matrix algebra is directly 
provable on the basis of the definition of matrix multiplication. 


AB= BA (2-7.7) 


“The transpose of a product is equal] to the product of the transposed 
matrices, but reversing their sequence.” The same rule holds for any 
number of factors, e.g., 


(ABC)= CBA (2-7.8) 


We must know, furthermore, that the transposition of a diagonal 
matrix A leaves that matrix unchanged, since the diagonal terms are 
not affected by the process of changing rows to columns. 


A=A (2-7.9) 


1 Exactly the same rule holds for the operation “‘inverse”’, i.e., to raise a 
product of matrices to the power —1: 


(ABC)? = C7B1471 


§7 Algebraic Treatment of Orthogonal Eigenvectors 77 


Finally, “the transposition of the transposition restores the original 
matrix”: 


eF (2-7.10) 


which is the direct consequence of the fact that changing rows to 
columns interchanges the sequence of the subscripts: 


~~ 


Qin = Ay; 


but two such interchanges restore the original sequence. 


Qin = aki = Aik 


Let us transpose on both sides of (6): 


VA = AV (2-7.11) 
and let us postmultiply this equation by U 
VAU = AVU (2-7.12) 
On the other hand, let us premultiply (7.5) by V 
VAU= VUA (2-7.13) 
Since the left sides are equal, we obtain 
AVU= VUA (2-7.14) 
or, denoting the product VU by W, 
VU=W (2-7.15) 
we have 
AW=WA (2-7.16) 


This means that the matrix W is commutative with the diagonal 
matrix A. We may also write 


AW — WA=0 (2-7.17) 
which means, in view of (3) and (4), 


0 (A, — A)Wig (A — 4, ) Win 
(A, Ay)Woy 0 (Ay _ A,)Wan — (2-7.18) 


(A, ae AW nt (A, ~~ Ae) n2 = 0 


78 Matrices and Eigenvalue Problems Chap. II 


But then, assuming that the roots A, of the characteristic equation 
are all different, we find that all the w,, (i 4 k) must be zero, i.e., W 
must be a pure diagonal matrix 


Wy 0 0 
w=|° ™ ° (2-7.19) 
0 0 w, 


This proves already the biorthogonality of the vectors U, and v;, 
since (19) expresses the fact the the dot product of any u; with any 
v, (i Æ k) must be zero. The further condition that all the w, 
become 1 cannot be proved, since this is a matter of definition. The 
equations defining the matrix U and the matrix V are such that each 
column of the matrices U and V can be multiplied by an arbitrary 
factor, and now we dispose of half of these factors in such a way that 
the dot products u,v, become 1. Then W becomes the unit matrix J, 
and we obtain g - 

vU=UV=I (2-7.20) 


Here we have the fundamental relation between the principal 
axes u, and the adjoint axes v, (the principal axes of the transposed 
matrix) expressed as a matrix equation. In consequence of (20) we 
can say that the matrices U and V are in a reciprocity relation to 
each other : : 
V = U4, V= U7 
U=VvA, ÜŪÜ=vV 


One matrix is the “‘inverse transpose” of the other. 


(2-7.21) 


A further remarkable relation is obtainable by postmultiplying 
(5) by V: 


AUV = UAV 
which gives, in view of (20), 
A= UAV (2-7.22) 
Similarly, ` 5 
A= VAU (2-7.23) 


Equation (22) shows that the original matrix A is obtainable by 
multiplying three matrices together, viz., the matrix U, the diagonal 
matrix A, and the matrix V. 


§7 Algebraic Treatment of Orthogonal Eigenvectors 79 


However, complete solution of the eigenvalue problem solves also 
the problem of inverting the matrix. Let us write once more the 
fundamental defining equation of the principal axis problem of 
the matrix A 


Ax = Ax 
Premultiplying by A“, we obtain 
a=AAtx or A =A ta 
This equation defines the principal axes A~!, and we come to the 
conclusion that the same vector u; which was a principal axis of A 
is also a principal axis of A~!, while the associated eigenvalue is the 


reciprocal of the original eigenvalue A,. Hence the solution of the 
eigenvalue problem of A is given as follows: 


1 1 1 
A= —, — banter are ae 
T i. (2-7.24) 
Ur; V'= V 


The diagonal matrix A contains now the reciprocals of the previous 
A and is thus equal to the inverse of the previous matrix A. If we 
apply the relation (22) to the new eigenvalue problem, we obtain 


A= UA-1V 


The inverse of A is thus generated purely in terms of the principal 
. axes and eigenvalues of A. 
Going back to our previous numerical problem, we have 


—15 —16 —4 1 0 0 
U=| 12 13 3|, A=ļl02 Ol, 
| 4 4 ıl 0 0 3 


80 Matrices and Eigenvalue Problems Chap. II 


and 


= 15 —16 —4 l 0 4 
UAV=| 12 13 | 0 2 —6 
4 4 1 E Ve ee 
—15 12 4 1 0 4 33 16 7 
=| —16 13 4ļo 0 2 —6 |=| —24 —10 ad 
— 4 3 1/|—12 —12 —9 =$ 417 


The resultant product coincides with the given matrix A. 


We now repeat the procedure, but changing the A, to their 
reciprocals, 


i 1 0 4 1 0 4 

r=| 0 1-3], A1V=| 0 E 
A SA ESO =; =f =l] 
opis 2 447 1 0o N 

uaa i 13 Ti o 4 -3 + 
—4 3 1}L-¢ -¢ -1 


In order to demonstrate that the final result is actually the reciprocal 
(or inverse) matrix A~1, we form the product AA7}: 


i 33 —24 —8] f4 —32] f1 0 0 
AA-=AoA=| 16 —10 —4lo 8 52/0 1 0 
$ 9} [0 0 1 


(72 —57 —17] 
The product of the two matrices gives the unit matrix J, which 
demonstrates that one matrix is the reciprocal (or inverse) of the 
other. 

We will seldom take recourse to this method of inverting a matrix, 
since inversion of a matrix is generally a much simpler task than 
solution of the complete eigenvalue problem. But if we have a 
problem in which the complete exploration of the properties of a 
given matrix A is demanded, it may be necessary to display all the 
eigenvectors and eigenvalues of A. In that case it can easily be of 
value that the inversion problem of the matrix is already included in 
the eigenvalue analysis. Moreover, for the critical study of “‘nearly 
singular systems,” whose inversion offers great practical difficulties, 


| | | 
wl OO w 
| 
wio rojen 69/00 
| 
Ne) in NS 
eee 


| 


oo wojen oio 


§ 8 Eigenvalue Problem in Geometrical Interpretation 81 


the relation between the eigenvalue problem and the inversion 
problem is of inestimable importance. 


8. The eigenvalue problem in geometrical interpretation. The 
algebraic treatment of the eigenvalue problem puts the spotlight on 
certain characteristic features of matrix algebra. In ordinary algebra 
the products ab and ba are interchangeable. In matrix algebra this 
is not the case, except under restricted conditions. We have found, 
for example, that the interchangeability of the products WA and 
AW, where A is a diagonal matrix with noncollapsing elements, led 
to the conclusion that W itself has to be a diagonal matrix. We have 
also found that AB = 0 does not lead to the conclusion that either 
A or B has to be zero. In our numerical example the product 


(A — IA — 21(A — 31) = (A — I4? — 5A + 61) 


vanished. And yet neither A—J nor A? —5A + 6I was zero. 
These are characteristic differences for which we have to watch if we 
operate with matrices. 

The purely formalistic algebraic operations with matrices, while 
they produce spectacular results, can easily blind us to certain 
deeper implications of the matrix problem. Hence we will enlarge the 
picture and make it much more meaningful if we associate with it a 
certain space structure which gives a geometrical interpretation to 
the operation with matrices. The operation with matrices is in 
closest relation to the analytical geometry of second-order surfaces. 
The entire eigenvalue and principal axis problem is intimately 
connected with the geometrical properties of the second-order sur- 
faces, called in our ordinary three-dimensional space ellipsoids and 
hyperboloids. 

The theory of curves and surfaces of second order has a long and 
inspiring history. The Greeks spent a tremendous amount of 
ingenuity on the geometrical investigation of the conic sections, which 
they have called ellipses, parabolas, and hyperbolas. Apollonius of 
Perga, called “the great geometer,” was in possession of all the basic 
properties of conic sections. He obtained his results by partly 
analytical] and partly projective methods. When almost two thousand 
years later Descartes invented analytic geometry and showed how 
the problems of geometry can be solved with the help of algebra, he 
introduced a new and powerful mechanism to the systematic 


82 Matrices and Eigenvalue Problems Chap. II 


exploration of geometrical problems. Yet there was no basic theorem 
in the realm of the conic sections which the Greeks did not discover 
earlier by sheer ingenuity, without the help of algebra. 

The Greeks did not know to what practical use the theory of conic 
sections may be put. Pragmatic evaluation of things was completely 
foreign to them, and they conceived the occupation with geometry as 
a privilege of the intellect, enjoyed for its own sake and not for any 
pragmatic gains. But in later ages this purely esthetic occupation 
with geometry paid rich dividends. When Kepler evaluated the 
observations of Tycho Brahe and came to the conclusion that the 
planets revolve not in circles but in ellipses around the sun, he based 
his calculations directly on the results of Apollonius. Without 
Greek geometry, Kepler could not have obtained his results. On the 
other hand, Newton’s theory of gravity and theory of motion could 
not have come into being without the preliminary results of Kepler. 
And thus we see the direct line from the Greek investigation of the 
conic sections to Kepler, Newton, the foundation of physics and 
engineering in the eighteenth century, and their development to the 
present standards. 

However, astronomy and physics are not the only instances in 
which the theory of conic sections plays a fundamental role. The 
analytic geometry of second-order curves and surfaces can be 
expanded from spaces of two and three dimensions to spaces of any 
dimensions. And then the discovery was made that the entire theory 
of linear operators—whether they appear as systems of ordinary 
linear algebraic equations, or as ordinary or partial differential 
equations, or as integral equations—can be formulated as a 
geometrical problem, associated with a certain second-order surface. 
The conic sections have been lifted out of the plane and put in a 
much more elaborate background. The space with which we operate 
is no longer a space of two or three dimensions, but a space of many 
dimensions and possibly even a space of infinitely many dimensions. 
But the quadratic surfaces, which find their place in these poly- 
dimensional spaces, still reflect the same fundamental properties that 
the Greeks discovered in their studies of the conic sections. 

In the analytic geometry of conic sections the equation of an 
ellipse is usually given in the following form: 

geo 
ae’ 


§ 8 Eigenvalue Problem in Geometrical Interpretation 83 
The equation of a hyperbola is usually written in the form 


x2 y? 
ae y7 
Then in solid analytic geometry—i.e., the analytic geometry of the 
three-dimensional space—the variable z is added, and we write the 
equation of an ellipsoid in the form 


x2 y? z2 


e p a 


If we want to eliminate the accidental numbers 2 and 3 from the 
study of spaces of arbitrary dimensions and formulate our equations 
in a way which leaves it free to choose any number of dimensions, the 
customary notations of analytic geometry have to be profoundly 
modified. We cannot use different letters z, y, z, ++ for the variables, 
since we would run out of letters. Moreover, we could never discover 
the inherent laws which govern our equations if we did not introduce 
a more systematic notation which takes into account the homogeneous 
nature of space in a more adequate fashion. The variables x, y, z 
have to be changed to 2,, %, 23. The subscript notation has 
immediately the consequence that the number of dimensions can be 
systematically extended to any number we want. 

Hence we will write the equation of an ellipse in the form 


Aai + farz = 1 (2-8.1) 
and the equation of an ellipsoid in the form 
Aya + Ags + Ags = 1 (2-8.2) 


Then, if we want the equation of an ellipsoid in n dimensions, we can 
immediately generalize the previous equations to 


Aya? + Aged fo + Aye? = l (2-8.3) 


The fact that n is not given is no handicap since we know the law 
according to which the equation is formed. 

However, in physics and engineering, the problem of an ellipse 
or ellipsoid is usually not encountered in this fashion. If our aim is 
to study the properties of an ellipse, it is justifiable to put our frame 
of reference immediately in a definite relation to that ellipse. The 


84 Matrices and Eigenvalue Problems Chap. II 


major and the minor axes of that ellipse can immediately be chosen as 
the x and y axes of our frame of reference. But usually this is not the 
given situation. We have a given frame of reference, chosen by some 
other considerations, and a certain ellipse or ellipsoid may appear 
in this frame of reference, but in arbitrarily slanted position. In this 


y z 


x 


position the equation of the ellipse or the ellipsoid is less simple than 
before. In the case of an ellipse, an xy term is present, while in the 
case of an ellipsoid, the products xy, yz, and zx will occur. Hence the 
equation of an ellipse has now three instead of two terms, the 
equation of an ellipsoid six instead of three terms, and in the general 
n-dimensional case the appearance of the product terms has the 
consequence that the equation is composed of $n(m + 1) instead of 
n terms. 

We have to learn how to handle these terms symbolically in order 
to make them operationally available. For this purpose we use a 
very definite method. Although the product zy appears only once, 
since xy and yx are equal by the commutative law of multiplication, 
we prefer to keep these two terms separated. Hence we will actually 
complicate the situation by writing n? instead of $n(n + 1) terms, but 
now we can trace the inner law of the terms more effectively. The 
equation of an ellipse will be written as follows: 


(atı + AypX_)ty + (dati + Ago%q)Xy = 1 (2-8.4) 
The equation of an ellipsoid becomes 
(atı F hatz + Ayg%qy)%y + (dux + Ago%e + AogX)Xe 
+ (aati + azg% + Agg%3)X3 = 1 (2-8.5) 


In arbitrary n dimensions, the same equation may be written as 


§ 8 Eigenvalue Problem in Geometrical Interpretation 85 


follows: 


(anti + ata + + Anin) 
i (aat Ez Agot 2 a 1 ar Aant nTa (2-8.6) 
+ (anti + apota + + AnnEn)En = | 


Since the terms x,7, and xx, can be combined into one, it is the sum 
a; + api only which influences the equation, and since a,, + ap; is 
symmetric with respect to an exchange of the indices i and k, we can 
assume from the very beginning that 


Qin = Aki (2-8.7) 


A matrix of this kind, which is insensitive with respect to an exchange 
of indices, is called a “symmetric matrix.” For example, the matrix 


3 4 —7 
A= 4 —5 0 
| —7F 0 2 
is symmetric, because ajs = ay = 4, di3 = az = —T, aa = Ago = Q. 


A symmetric matrix has the property that a transposition of rows and 
columns does not change anything on the matrix, and thus 


A=A (2-8.8) 


Matrices of this kind have particularly important properties which 
distinguish them within the wider class of arbitrary matrices. 

In matrix notation the equation of a general second-order central 
surface becomes 


rAz = 1 (2-8.9) 


The vector x = (£i, zə, x3) in three dimensions 
and x = (ti, %,°*, £, ) In n dimensions has the 
significance of the “radius vector” r: which 
connects an arbitrary point P of the surface 
with the origin O. 
It will now be our task to find the principal axes of this quadratic 
surface. This task does not exist if the surface is already given in the 
form 


Aya + Ar to p Aye? = 1 (2-8.10) 


86 Matrices and Eigenvalue Problems Chap. II 


since then the existence of the principal axes is taken for granted, and 
they are chosen as the rectangular axes of a Cartesian frame of 
reference. What characterizes specifically the principal axes of a 
quadratic surface? At every point of a surface the “normal” n can 
be constructed, i.e., a vector which is orthogonal to the tangential 
plane of the surface at the point P. The vectors r and 7 are generally 
not parallel to each other. Only in very exceptional directions, 
namely in those directions which we usually choose as coordinate 
axes, does it happen that normal and radius vectors become parallel 
to each other. Hence the name “principal axes” for this particular 
set of directions. 


Now the direction cosines of the normal n are proportional to 
Ax. Thus the parallelism of the radius vector x and the normal Az 
is expressed in the equation 


Ax = dx (2-8.11) 


This equation was the fundamental equation of eigenvalue analysis. 
We obtained it here by asking for the principal axes of a quadratic 
surface. 

In our previous discussion we obtained the complete solution of 
the eigenvalue problem by finding the n values of A for which the 
equation is solvable, and the associated n vectors t4, Ug, * , Up, for - 
which the equation is solvable. The eigenvalues /,, 22, --- , An have 
a definite geometrical significance. Let us find the point P on the 
quadratic surface in which the principal axis intersects the surface. 
By the equation of the surface, 


LAL] (2-8.12) 
By the equation of the principal axes, 
Ax = At (2-8.13) 


§8 Eigenvalue Problem in Geometrical Interpretation 87 
Multiplying this equation by x, we obtain 

xAx = àx? = 1 (2-8.14) 
which gives 


x? = 1/2 (2-8.15) 


The significance of a 
ga or tote tee? (2-8.16) 


is the square of the distance of the point P in which the principal 
axis intersects the surface. Hence A, is the reciprocal of the square of 
the distance of this point P from the center. A large eigenvalue 
means that in the direction of a certain principal axis the quadratic 
surface comes near to the center. A small eigenvalue means that in 
the direction of a certain principal axis the surface stays far from the 
center. 

The general principal axis problem is greatly simplified in the 
present case because of the fact that A is a symmetric matrix and thus 
A = A. In the general case we have to find separately the principal 
axes of A and of A. But if A is symmetric, and thus A = A, then the 
equation 

Ax = hx 


solves simultaneously the equation 
Ax = dx 
The principal axes of A and A now coincide. This has the consequence 


that the two matrices U and V, which characterize the general 
principal axis problem, become equal. 


V=U (2-8.17) 
But then the fundamental equation between these two matrices 
UV=1 
is reduced to the equation 
UU=I1 (2-8.18) 


This equation has the following significance. Let us form the dot 
products of the various columns of the U matrix with each other. 


88 Matrices and Eigenvalue Problems Chap. UH 


The dot product of any column with itself gives 1, the dot product of 
any column with another column gives 0. This means that the 
principal axes are mutually orthogonal, while their length is nor- 
malized to 1. A set of n vectors of this property, placed in an 
n-dimensional space, is called an “‘ortho-normal set.” 

We now have the proof that a “general” quadratic surface has n 
(and only n) principal axes and that these axes are orthogonal to each 
other. Hence we can introduce these axes as a new rectangular frame 
of reference and then we arrive at the point where the customary 
analytical geometry starts its investigations, by assuming from the 
beginning that the principal axes of the quadratic surface coincide 
with the rectangular axes of analytic geometry. 

In actual fact we have not proved yet the reality of the principal 
axes, since the eigenvalues A are the roots of an algebraic equation of 
nth order, and generally these roots may be complex numbers. The 
fundamental fact holds, however, that all the eigenvalues of a 
symmetric matrix of real numbers are real. And if the eigenvalue is 
real, then the associated solution 


Ax = Ax (2-8.19) 


must also be real. The reality of A can be proved by assuming that 
A is complex and showing that this assumption leads to a contra- 
diction. Indeed, let us multiply (19) by x*, where the notation * 
refers to the operation “‘complex conjugate,” i.e., the change of 
ito —i. 
x* Ax = Àx*x (2-8.20) 
Since in any algebraic relation involving complex numbers we know 
that such a relation remains true if every i is changed to —i, equation 
(19) has the consequence: 


A*z* = A"r* (2-8.21) 
and premultiplying by x, we obtain 
CAME == fee (2-8.22) 


now we make use of the following fundamental transposition law. 
x Ay = y-Ax (2-8.23) 
This law, applied to the left side of (22), gives 
a* Atx = À*x*x (2-8.24) 


§ 8 Eigenvalue Problem in Geometrical Interpretation 89 


Now, if A is real, then A* = A, since the imaginary unit does not 
appear in A, and thus the change of i to —i leaves the matrix un- 


altered. Moreover A = A, on account of the symmetry of the 
matrix. But then 


x* Ax = A*a*x (2-8.25) 
and, taking the difference of (20) and (24), we obtain (the left sides 
being equal) 

(A — A*)x*¥x = 0 (2-8.26) 
The second factor cannot be zero, since it is the sum of positive 
quantities 


ote = ate, + aha, + + a%e, =|? H larl +--+ [2,[ 
We thus find A— A*=0 (2-8.27) 
or A= A" (2-8.28) 


which means that 4 must be real. 

The reality of the principal axes of a quadratic surface is thus 
ascertained. It is important to know that the proof of the reality of 
A depends solely on the condition 


A*= A (2-8.29) 


Sometimes we have to solve the principal axis problem associated 
with a matrix with complex coefficients. If this matrix satisfies the 
condition (29), i.e., if the transposition of rows and columns and the 
simultaneous change of i to —i restores the original matrix, we call 
such a matrix “Hermitian.” For example, the following matrix is 
Hermitian. 


3 442i —7+5i 
A= 4— 2i —5 — 3i 
—7 — Ši 3i 2 


The transposition does not leave the matrix unchanged, but a 
transposition and simultaneous change of every i to —i restores the 
original matrix. Although the eigenvectors of such a matrix are no 
longer real but complex vectors, the eigenvalues are still real, and 
the exceptional conditions which hold for real symmetric matrices 
carry over to the realm of Hermitian matrices. These matrices 
correspond in the complex realm to the symmetric matrices in the 


90 Matrices and Eigenvalue Problems Chap. II 


real realm. If a matrix has complex coefficients, the symmetry of the 
matrix is frequently of small advantage, although it is still true that 
the matrix V coincides with U. If A is Hermitian, the relation 
between V and U becomes 

V= UF (2-8.30) 
and thus UU* =] (2-8.31) 


This, together with the reality of the eigenvalues, preserves the 
outstanding properties of ortho-normal vector systems. 


9. The principal axis transformation of a matrix A. The geo- 
metrical approach to the problem of principal axes puts the emphasis 
on one particular phase of the theory which the purely algebraic 
theory does not reveal so conspicuously. If we picture a matrix in 
association with a quadratic surface, we see at once what the 
meaning of a principal axis is. But beyond that we see immediately 
the possibility of changing our frame of reference. Since the principal 
axes are mutually orthogonal and of the length 1, we can conceive 
them as a natural frame of reference associated with the given 
matrix. It will thus be advisable to study the problem of the 
transformation of coordinates. 

Our original coordinates are 


j t= (2, Ta, Tn) (2-9.1) 
Let us introduce a new set of coordinates: 
0 Rie (õi Ba t En) 


by changing our original axes to a new set of axes and generating the 
radius vector Ž as a linear combination of these new axes. 

The general problem of coordinate transformations appears in the 
following form. Let us give a set of “base vectors”: 


ty, Uy, °° Uy, (2-9.2) 


not necessarily orthogonal to each other and not normalized in 
length. Hence in general we want to assume that these vectors u, are 
of arbitrary length and arbitrary direction, satisfying only one 
condition, viz., that they are linearly independent. Then the radius 
vector x can be generated as a linear superposition of these vectors. 


£= tth + tty + + egy (2-9.3) 


§9 The Principal Axis Transformation of a Matrix A 91 


In algebra the same vector is written with its components only: 


E= (Ey Xp, °°, ty) (2-9.4) 
considering it an assembly of quantities x,, characterized by a single 
subscript. 

Now the choice of the base vectors uj, us, *' , u,, is more or less 


accidental. With equal right we could have chosen another set of 


vectors 
ily, Tig, + , Ün (2-9.5) 


and obtain the same vector z as a linear superposition of the new 
base vectors ü. 


v= UN T Zollo a hes a Ent (2-9.6) 


The relation between the two representations can be found as 
follows. Let us analyze the new vectors ŭ, in terms of the original 
vectors u;. Since any vector can be obtained as a linear superposition 
of the base vectors, we can put 


Uy = Alh + Aga +7 Fann 
üz = zy F Agga + + anly (2-9.7) 


Un = Ay yl F dz nliz Sere ete anny 


Writing the same equations in components only, we can put the 
components of u, Uz, *' , U„ in Successive columns and thus obtain 
the following algebraic representation of the given coordinate 
transformation. 


(2-9.8) 
Omitting the-vertical bars, we obtain the matrix A: 
41 Ai *" Ain 
Ac ee e ie (2-9.9) 


92 Matrices and Eigenvalue Problems Chap. II 


which uniquely characterizes the entire coordinate transformation. 
The successive columns of this matrix have the significance of giving 
the components of the first, second, --- , nth new base vector, analyzed 
in the original reference system. If we introduce in (6) the trans- 
formation (7), we obtain 


U = Ty(Ayty + Agu, + °° + antn) 
+ Zalai F gga + °° + Anglln) 
+ vee 
F Eplini + Ag nll, + °° + annin) 
= (ant, + la +` + yn ¥ ny 
+ (aat, + aoo + + aznËn)z 
+: 
+ (anti + anoto + + AannnUn 
(2-9.10) 
Originally the vector x was expressed in the form 


xr = Tı + Tolg + Ea -+ TnUn (2-9.11) 


The new form (10) must coincide with (11), since a vector cannot 
have two different representations in one and the same set of axes, 
if the axes are linearly independent. This yields 


Tı = Qatı + lta H + ainn 
Tz = axi =i Azfa ar i T Az nX n (2-9.12) 


Ly = an + anota + A i Annn 


or in matrix notation, 
x= Áf (2-9.13) 


It will be advisable to change our notations slightly. Since the 
original vectors u, Us, °° , u„ do not appear in the final formulation, 
we may omit the bar over the ù, ñs =, %, and call the new base 
vectors simply u, uy, °°, u,. Moreover, the components of these 
vectors, analyzed in the original frame of reference, can be con- 
veniently denoted by the symbol u,, instead of a, Hence the 
transformation matrix should be denoted by U, with the under- 
standing that the subsequent columns of this matrix represent the 


$9 The Principal Axis Transformation of a Matrix A 93 


components of uy, us, *', u,. The transformation equation now 
becomes 
v= US (2-9.14) 


If the original base vectors of our coordinate system represent a 
customary rectangular set of axes, this will generally not be true any 
more of the new vectors 1, us, ,u,. We may want, however, to 
preserve the rectangulai character of our reference system. Then we 
have to set a condition on the transformation equation (14). The 
new vectors U, Uy, °*-, U„ have to be mutually orthogonal to each 
other and their length must be normalized to 1. This means that the 
dot product of any u vectors must come out as 0, while the dot 
product of any u vector with itself must come out as 1. All these 
conditions are included in the single matrix equation 


UU =I (2-9.15) 


This equation coincides with the equation (8.18) established earlier 
in the problem of finding the principal axes of a quadratic surface. 
The principal axes of a quadratic surface represent a set of base 
vectors which are automatically rectangular and which can thus be 
introduced as a new reference system. Let us see what happens to 
the quadratic surface as a result of this transformation. We put 


x= UE (2-9.16) 


and introduce this transformation in the defining equation of the 
quadratic surface: 


(UZ) (AUZ) = 1 (2-9.17) 

We make use of the transposition law (8.23): 
a Ay = y Ax 

which, applied to our problem, gives 

#-(AU)UE = 1 (2-9.18) 
Since, however, (cf. 7.7) 

(AU) = UA = UA (2-9.19) 

we obtain #UAUE = 1 (2-9.20) 


Now by the definition of the principal axes 
AU= UA (2-9.21) 


94 Matrices and Eigenvalue Problems Chap. II 
Premultiplying by U, we obtain 


UAU= UUA=A (2-9.22) 
and thus EAG = 1 (2-9.23) 
which means Adi + Ate +e +1, = 1 (2-9.24) 


We thus obtain the traditional form of a quadratic surface as it 
appears in analytic geometry, where we assume in advance that the 
principal axes of the given quadratic surface are chosen as the axes 
of a rectangular reference system. 

The fact that in the new reference system the matrix A is replaced 
by the diagonal matrix A shows that the transformation to the 
principal axes had a profound influence on A by transforming the 
matrix into a particularly desirable form, viz., a purely diagonal 
form. The operation with diagonal matrices is infinitely simpler 
than the operation with arbitrary matrices. The equations are 
separated in the unknowns and immediately solvable. On the other 
hand, this diagonalization of the matrix requires the knowledge of 
the principal axes, which is generally not easily accomplished. 
However, the method of coordinate transformation is nevertheless a 
fundamentally important tool of matrix analysis. We may have a 
method of obtaining the principal axes of a matrix in rather crude 
approximation. Then the operation UAU will not turn A into a 
diagonal form, but it will boost up the diagonal terms in comparison 
to the other terms. Such a matrix is still of great numerical advantage 
because matrix problems associated with such a matrix can be 
solved numerically in a series of quickly convergent iterations. 

A special discussion is demanded in connection with multiple 
roots. The eigenvalues A; were obtained by solving an algebraic 
equation of the order n. Such an equation has always n roots but it 
may happen that some of the roots collapse into one. In that case 
we speak of a “multiple root” because that root stands for several 
distinct roots. The case of collapsing roots can be conceived as a 
limit, starting out with distinct roots and letting them approach each 
other indefinitely. Here again the geometrical picture helps us 
understand the nature of multiple roots. In the equation of an 
ellipse the equality of A, and A, brings about the equation of a circle: 


neta 


§ 10 Skew-angular Reference Systems 95 


In space the equality of two å values generates a rotational ellipsoid: 
Ali + 23) + Agr = 1 
and a sphere is generated if all the three A values are equal: 
Alai + ay + 25) = 1 


Now, if an ellipse becomes gradually a circle, its principal axes do not 
cease to exist. They become two mutually perpendicular diameters 
of the circle. However, the same circle may be the limiting position 
of an infinity of ellipses and thus any two mutually perpendicular 
diameters of the circle may serve as principal axes. In a similar way 
in the case of a sphere, any three mutually perpendicular diameters 
may serve as principal axes of that sphere. The same holds in higher 
dimensions. If in the general n-dimensional case m eigenvalues 
collapse into one, this means that in a certain m-dimensional 
“subspace” spherical conditions prevail. Any m mutually ortho- 
gonal axes can be chosen within that subspace as principal axes of 
the quadratic surface. The existence of multiple roots does not 
invalidate the existence of distinct and mutually perpendicular axes. 
What happens is only that some of these axes are no longer uniquely 
determined but can be replaced by other equally valid axes. The 
collapse of certain eigenvalues into one is not connected with a 
corresponding collapse of the associated axes. The mutual ortho- 
gonality of the principal axes prevents them from ever collapsing 
into one. 

Numerically the multiplicity of roots is always the cause of certain 
difficulties. If certain eigenvalues come very near together without 
collapsing into one, the associated principal axes are theoretically 
still uniquely determined. But to find these axes with any degree of 
accuracy becomes increasingly difficult as the difference between two 
eigenvalues decreases to smaller and smaller amounts. 


10. Skew-angular reference systems. In our previous discussions 
we have assumed that the quadratic surface is in a slanted position 
relative to the axes of our reference system. Then the principal axes 
of the quadratic surface were obtained and these axes introduced as a 
new frame of reference. This transformation of the coordinates had 
the effect that the symmetric matrix A was transformed into a pure 


96 Matrices and Eigenvalue Problems Chap. II 


diagonal matrix A. We will now go one step further. Up to now we 
have assumed that we operate with a rectangular set of coordinates. 
The axes of our reference system did not coincide with the axes of 
the quadratic surface and this necessitated a transformation of the 
coordinates. But both the original coordinates axes and the principal 
axes of the quadratic surface were rectangular axes. The trans- 
formation involved was a mere rotation of the axes, i.e., an ortho- 
gonal transformation. Such an orthogonal transformation is 
characterized by the matrix equation 


UU=I1 


The principal axes of a quadratic surface automatically satisfied this 
equation. 

It can happen, however, that we encounter a still more general 
situation. The base vectors uj, us, °°, uv, of our original reference 
system may not be orthogonal to each other. We then have a 
““skew-angular’’ set of axes. In this case the quadratic surface must 

of necessity be in a slanted position 

relative to our axes, because the 

v2 principal axes of the quadratic 

surface remain orthogonal to each 

other and thus we are sure that our 

skew-angular axes cannot coincide 

with the principal axes of the 
quadratic surface. 

Since in ordinary analytical geometry we use almost exclusively a 
rectangular set of axes, our first task will be to investigate quite 
generally the operation with skew-angular axes. We assume that the 
base vectors u}, uz, *' , U„ are given as a set of n vectors of arbitrary 
direction and arbitrary length, except for one condition, viz., that 
these vectors shall be Jinearly independent. This means that no 
vector can be obtained as a linear combination of the other vectors. 
In other words a linear relation of the form (other than for each «, 
equal to zero), 


ath + agua + + antin = 0 (2-10.1) 


is excluded, since such a relation would imply that one vector—e.g., 
u, if «,, is different from zero—could be obtained as a linear super- 
position of the other vectors. 


§ 10 Skew-angular Reference Systems 97 


As a consequence of this linear independence, we see at once that 
an arbitrary vector x, obtained as a linear superposition of the base 
vectors 


© = Ly F Tolg H F Epin (2-10.2) 


cannot have two different representations. If the same vector x 
could be obtained in a different way: 


x£ == Èh + Ezta Hie + Enun (2-10.3) 
then the difference of (2) and (3) gives 


(x, — Du, + (to — Eaua +0 + (En — Enun = 0 (2-10.4) 


This, however, would establish a linear relation of the form (1), 
which was excluded in advance. The representation (2) of the 
vector x is thus unique. 

We can interpret (2) in the sense of a synthesis. Given the base 
vectors 4, Us, *' , Up, we obtain a vector by multiplying each one of 
the base vectors by 2, £a, * , x,, and adding them up in the sense of 
vector addition. But frequently the inverse of the problem is 
encountered. The vector x is given, and we have to find out what 
linear superposition of the base vectors will generate that particular 
vector. We say that we analyze the vector x in the reference system 
of the base vectors u,, us, * , u,. Here x is given, and we have to 
find the coefficients x4, £2, ** , x, Of the linear superposition problem 
(2). These coefficients are called the “components” of the vector x 
in the reference system of the base vectors tj, Us, * , Un. 

From the standpoint of this analysis, the rectangular systems are 
vastly superior to the skew-angular systems. If the vectors u,, 
Uz, ` , U„ form an orthogonal and normalized set of vectors, they 
satisfy the equations 


uyu, = 0 (i Æ k) 
a (2-10.5) 
u; = 1 


These equations express in algebraic form the geometrical facts that 
any two vectors of the set are mutually orthogonal and that the 
length of any vector of the set is 1. In this reference system the 
“dotting” of the vector x with the unit vector u; gives 


TU; = (Eily F Lola H` F Epp); = T; 


and thus oe (2-10.6) 


98 Matrices and Eigenvalue Problems Chap. II 


The mere multiplication of the vector x by the vector u, gives the 
scalar x,, which is directly the component of x in the direction of the 
base vector t. 

The same equation is not true, however, in a skew-angular 
reference system. Here the independence with which each one of the 
base vectors u; operates is lost, and we have to find some other tools 
for replacing the simple equation (6). We do that by constructing a 
new set of vectors v,, Ve, *' , Vp, Called the “adjoint” set. Everything 
we do with rectangular axes can be duplicated with the help of 
skew-angular axes, provided only that we enlarge the given set of 
vectors to double capacity. Instead of operating with the single set 
of vectors u,, Uy, *** , Un, We Operate with a double set of vectors. 


bse Uy, °"" y sn) (2-10.7) 


U1, Ve wi Vy 


Every vector u, is associated with a corresponding vector v,;, called 
the “conjugate” of u, The vectors v, and u; are in a definite duality 
relation to each other, which holds symmetrically in both directions. 
If the vectors u, are given, their “adjoints” are the vectors v;. If, on 
the other hand, we start with the vectors v4, v9, * , v,, aS base vectors 
and construct their adjoints, we obtain the vectors t, us, * , Up 
Hence the adjoint of the adjoint set is the original set. 
The adjoint vectors v4, V2, + , v,, are uniquely defined. They are in 

a biorthogonality relation to the original vectors uj, Ug, , u,, in 
the sense that any vector u, of the one set and any vector of the other 
set v, (excluding its own conjugate v,) are mutually orthogonal. 
Moreover, the length of the adjoint vectors v, are normalized by the 
condition that the dot product u,v, of any vector with its own 
conjugate shall be 1 

u;v,= 0 (~k) (210.8) 

u,v; = 1 


We see that the advantage of a rectangular set of vectors lies in the 
fact that the necessity of constructing a second set of adjoint vectors 
is obviated because there the vectors u; themselves satisfy the ortho- 
gonality relations (5), and this means that the vectors v, coincide with 
the vectors u; 

v; = U; (2-10.9) 


§ 10 Skew-angular Reference Systems 99 


A rectangular and normalized set of vectors is therefore called 
“self-adjoint.” For any other set of vectors the set v; has to be 
specifically constructed. However, after this construction is done, 
we can operate with a skew-angular reference system just as easily 
and effectively as with an orthogonal system. 

For this reason we will assume that whenever a skew-angular 
reference system u, Us,°, U, is given, we have automatically 
associated with it the adjoint system v, v2, * , v, and both vector 
systems are always considered together and not the one or the other 
system alone. 


es aes nn) > uv, =0 GAA, usv; =1 (2-10.10) 
P 0; 

The problem of analyzing a vector x in the reference system of the 
base vectors u; has now a simple solution. We multiply the equation 


£ = Uy + Toly + +2,U, 


by the vector v; and obtain, on the basis of the biorthogonality 
relations (8), 


and thus L, = UV; (2-10.11) 


The dotting of x with conjugate vector v, gives the factor ~,, i.e., the 
component of x in the direction of the base vector u,. 

Now the vectors u, and v; are so closely associated that one vector 
system has no meaning without the other. If we used the u, as a base 
system for analyzing the vector x, we can use with equal justification 
the vectors v,. 

x= &v, + Ea +> + Enn (2-10.12) 
Hence every vector has two representations in a skew-angular 
reference system, once denoted by Latin and once denoted by Greek 
letters. If (12) is dotted by u,, we obtain 


The vector x is the same in both cases; only the reference system has 
changed in which the analysis was made. We do not have a dual pair 
of vectors, but two conjugate representations of one and the same 
vector, analyzed once in the original and once in the adjoint system. 


£= Tilly F Wolly + + Enun = ÈW + Ewa + H Enn (2-10.13) 


100 Matrices and Eigenvalue Problems Chap. II 


However, for algebraic operations we omit the unit vectors u}, Ua, °° 
u,, and retain the components only. We thus write 


B= (Fi to a) (2-10.14) 


Similarly, if we want to operate with the adjoint set of axes, we will 
omit the base vectors v,, vg, ** , Vv, and write solely the components 
Ši 2% ++, En. But these components are completely different from 
the previous components, and thus the resultant vector cannot be 
called x any more, but £. 


f= (Šis Šo, ws é.) (2-10.15) 


While this notation is entirely logical and consistent as an algebraic 
notation, it is necessary to keep in mind that the two algebraic 
vectors x and å are not more than two parallel representations of one 


bi 


single geometrical entity, viz., the vector x, on account of the two 
dual reference systems with which we have to operate. 


In the algebraic sense the vector x has to be written down as the 
pair of vectors (x,£). In the customary rectangular reference systems 
of analytic geometry this doubling of the components does not occur, 
since the vector & and the vector x coincide. But in a skew-angular 
system the vector x is necessarily associated with the dual vector &. 

We will now form the dot product of the two vectors x and y. 


= Tyly F Lyla H H Enun = Evy + Eta to + San 

Y = Yy F Ya F H YnUn = MV + Noe H ENa 
In view of the duality relations between the u, and v, axes, it is 
inevitable that the one vector must be used in the one, and the other 


in the other representation. We cannot form the products 2-y or 
En, but we can form the products x-7 or &-y. 


EN = (LyUy + Lolly + + + Enun M + NV + °° + NYa) 
= GM + ENa Pe F EyNn 

Ey = (G0, + fog + + Enn Y + Yate H: Ynn) 
= EY + EY to + EY n 


In both cases we obtain the same dot product of the vector x and the 
vector y and thus: 


J a} 


an = Ey (2-10.16) 


§ 11 Principal Axis Transformation: Skew-angular Systems 101 


The ordinary definition of the dot product of two vectors in the sense 
of multiplying corresponding components and forming the sum 


a= (t, Q,°"* 5 an) 
b = (b,, bg,» , bn) 
a:b = a,b, + azaba ++, 4,5, 
remains valid without any alteration. But we have to know that the 
dot product of two vectors can only be formed if the two factors are 
given in dual representations. The dot products 
EY = Lh F LY H + LAY n 
En = M + SaNa +o + EnNn 
have no meaning whatsoever. One factor has to be analyzed in the 
u system, the other factor in the v system. (In tensor calculus the 
components 2, Zo, ,&, Of a vector are called the “covariant 
components,” and the components &,, &, --, €, of the same vector 
are called the “contravariant components” of the vector x. The 
distinction between the two sets of components is not made by using 
Latin and Greek letters, but by putting the subscripts once in an 
upper and once in a lower position. 
% == (21, Vy, °° 5 @,) (covariant vector) 
CS (Cte) (contravariant vector) 


The dot product of the vectors x and y becomes 
ay = aly, + ya wy, = By? + ayy? + > + Eny” 
This notation is less suited, however, for matrix operations.) 
11. Principal axis transformation in skew-angular systems. We 


assume that a quadratic surface is given in a skew-angular frame of 
reference. The base vectors are 


fe seh xd (2-11.1) 


Uis Va, eee 5 Un 
The radius vector x is given in dual fashion: 
L= (Ti, Xq, , Ep) 


g Fe (i Ea, ry P) (2-11.2) 


102 Matrices and Eigenvalue Problems Chap. II 


The equation of a quadratic surface in ordinary rectangular co- 
ordinates has been 


x: Ax = 1 (2-11.3) 


In a skew-angular reference system the dot product can be formed 
only in dual representation, giving one vector in the base system and 
the other vector in the adjoint system. Hence now we have 


&-Ax = 1 (2-11.4) 
or written out in detail, 


ACACA + Aitz +e Ay Xn) 
Eo(dayt, + azta + + Aan p) (2-11.5) 


È nAn zE AngXe F as a Annin) = 1 


These are truly n? different terms which cannot be reduced to 
n(n + 1)/2 terms, because the terms a,,&,7, and ap:čęx; do not 
combine any more. The matrix A is no longer symmetric but can be 
given as an arbitrary matrix with n? different elements. 

The principal axes of the quadratic surface are once more those 
directions in which normal and radius vector become parallel. The 
direction cosines of the normal are once more proportional to Az, 
and the proportionality of normal and radius vector is once more 
expressed in the equation 


Ax = hx (2-11.6) 


However, every vector has a dual aspect by being representable in 
both the original base vectors u; and the adjoint vectors v; Hence we 
want to obtain the equations of the principal axes not only in the 
original system but also in the adjoint system. This means that we 
want to add an equation which will determine é instead of x. Now by 
the algebraic identity 

a Ay = y: Ax 


we can write the basic equation (3) in the alternate form 
a: AE = 1 (2-11.7) 
and, applying the equation (6) to the new form, we now obtain 
Ag = AE (2-11.8) 


§ 11 Principal Axis Transformation: Skew-angular Systems 103 


We have encountered the equations (6) and (8) earlier, in connection 
with the eigenvalue problem of the matrix A and its transpose A. 
In the purely algebraic treatment, the principal axes of A and A 
separate into n + n independent vectors. We have altogether 2n 
vectors, and we have seen that these two sets of vectors are in a 
biorthogonality relation to each other. In the geometrical treatment 
we have in fact only n vectors, namely, the n principal axes of a 
quadratic surface. But these n vectors have to be analyzed in two 
reference systems, viz., the usystem and the v system. And thus every 
one of the principal axes has two sets of components, viz., the 
components 21, %,°*,2, in the u system and the components 
Š &, +, €,, in the v system. Instead of 2n vectors we have merely n 
vectors, but each vector represented in two ways. If these vectors are 
once more called 


Uy; Uo, y Un 
and we use once more the notations of the previous section, we can 


put the components of these vectors in successive columns of a 
matrix. 


il, | ü, | das | Ün 
ui Ure Uin (2-1 jl .9) 
U21 | Yen | `° | Yan 
Un Ung gi Unn 


This means in terms of vectors, 


i, =e Uy Uy F TARTA ap as U ysUy 
: (2-11.10) 


Un = Uy nly T Ug nua a n T Unnin 
But the very same vectors can likewise be analyzed in the v system. 


ii, zz Dv, T Vaa + Hea Dan 
: (2-11.11) 


=> —> > -> 
Un = Vinti T Va nV2 a eg T Unnt n 


Now we abandon the reference to the original base vectors altogether. 
Then we can omit the bars and call the new base vectors simply 


104 Matrices and Eigenvalue Problems Chap. H 


Uy, Up, ***, Un. These vectors are algebraically characterized by the 
matrix U: 


uy Us Un 
Uy, Ui Uin 
ỌU=| “a “zx Usn 
: (2-11.12) 
Un Uno Di Unn 


But the very same vectors, analyzed in the adjoint system, represent 
algebraically a second set of vectors, to be denoted by v4, v9, °° , Un: 


01 Vo eee Un 
Vi Uig Uin 
yg i U van 
: (2-11.13) 
Uni Ung ` Unn 


The columns of U and the columns of V belong to the same set of n 
vectors. The fact that any two of the principal axes are orthogonal to 
each other means that the dot product of any U column with any 
V column except its own gives zero. Moreover, the fact that the 
length of the principal axes is normalized to 1 means that the dot 
product of any U column with the corresponding V column gives 1. 
In matrix equation, 

ÜV =I (2-11.14) 


We have encountered this equation before as the result of algebraic 
operations. The new insight we gain by the geometric treatment is 
that this equation expresses the orthogonality of the principal axes 
of a quadratic surface. In the earlier section this orthogonality 
appeared in the form 

ÜU =I 


and the matrix A was symmetric. The more general equation (14) 
comes about because the base vector system of our analysis is no 
longer rectangular but skew-angular. 

We will now introduce the principal axes of the quadratic surface as 
a new reference system. Here the duality of the original and the 
adjoint system can be abandoned because the new reference system is 
“ortho-normal” (orthogonal and normalized in length) and therefore 


§ 11 Principal Axis Transformation: Skew-angular Systems 105 


self-adjoint. The transformation to the new axes is given by the 
equations 


= UZ, E = Vz (2-11.15) 
Introducing this transformation in (4), we obtain 
Vz- AU% = 1 (2-11.16) 
and by transposition, 
#(AU) Vi = ZUAVE = 1 (2-11.17) 
Since, however, by the definition of the matrix V, 
AV=VA (2-11.18) 


premultiplication by U gives 
UA =UVA=A (2-11.19) 


and thus, according to (17), the equation of the quadratic surface in 
the new reference system becomes 


Af = 1 (2-11.20) 


This equation coincides with the previous result (9.23) of trans- 
forming a quadratic surface to its principal axes. In the previous 
problem we started with a symmetric matrix, while now our starting 
point was an arbitrary matrix. The final result must be the same 
since in both cases the quadratic surface involved was the same and 
in the final transformation, after introducing the principal axes of the 
surface itself, all traces of the previous set of axes disappear, and thus 
it can make no difference whether we have started with a rectangular 
or a skew-angular set of axes. 

In final analysis we can say that the principal axis problem of 
quadratic surfaces, analyzed in a skew-angular frame of reference, 
demonstrates that even nonsymmetric matrices can be transformed 
into a purely diagonal form. This transformation appears in the form 


VAU=A or also UAV=A 


There is, however, one fundamental difference which distinguishes 
the skew-angular case from the rectangular one. In the rectangular 
case (A = A), we could prove that all the eigenvalues and eigen- 
vectors are real. Hence the rotation from one frame of reference to 


106 Matrices and Eigenvalue Problems Chap. II 


the other was always a real rotation within the real n-dimensional 
space. In the general case, this is no longer so. The eigenvalues A, 
are generally complex numbers, and thus the elements of the matrix A 
are generally complex elements. Similarly the elements of the matrices 
U and V are generally likewise complex numbers. 

A second fundamental difference has something to do with the 
distinctness of the eigenvectors. As long as the eigenvalues A, are 
distinct, the corresponding eigenvectors are also distinct. In the case 
of multiple eigenvalues, however, special conditions prevail. In the 
symmetric case we have seen that the collapse of the eigenvalues did 
not mean a corresponding collapse of the eigenvectors. It meant 
only a partial indistinctness of the eigenvectors, inasmuch as within a 
certain subspace of m dimensions any m mutually orthogonal axes 
could be chosen as eigenvectors. This is different in the non- 
symmetric case. The original skew-symmetric reference system may 
be very far from an orthogonal system, and it may happen that in the 
limit some of the base vectors collapse into one. 

The final transformation gives us a clue concerning the original 
orientation of the base vectors. The transformation from x to £ 
occurred with the help of the equation 


x= UZ 
Multiplying by V, we obtain the inverse of this equation: 
= Va (2-11.21) 


This tranformation equation shows that the original base vectors, 
analyzed in the orthogonal reference system of the principal axes, are 
given by the successive columns of the matrix V; i.e., the successive 
rows of the matrix V. If we could guarantee that our original base 
vectors represented a linearly independent set of vectors, the deter- 
minant of the matrix V would of necessity be different from zero, and 
we would know that the principal axes of A form a system of n 
linearly independent vectors. It can happen, however, that during the 
limit process, when two near eigenvalues eventually collapse into one, 
also the original system of base vectors eventually becomes linearly 
dependent. In that case, the determinant of V becomes zero, which 
means that the number of linearly independent eigenvectors of A is 
less than n. 


§ 11 Principal Axis Transformation: Skew-angular Systems 107 


An interesting example is provided by the three matrices: 


1 0 0 1 1 2 1 —1 3 
0 1 0j, 0 1 0j, 0 1 2) (2-11.22) 
00 1 00 1 0 0 1 


In all three cases the determinant equation which gives the eigenvalues 
A, is reduced to 
(A— 1? =0 (2-11.23) 


which shows that A= 1 is the only possible eigenvalue, with the 
multiplicity 3. The three generally distinct eigenvalues of a 3 x 3 
matrix collapse in our case into one. 

Now in the first case the given matrix is the unit matrix. The 
associated quadratic surface is a sphere. Here we can choose any 
three mutually orthogonal axes as the principal axes of the given 
matrix. The three principal axes exist and they are well separated, 
although their direction is not uniquely determined, and an infinity 
of orthogonal systems can be chosen as principal axes. 

Let us now examine the second matrix. The equations of the 
principal axes become 

Tı F Ly + 2g = 21 
a) =e 
ve 3 
This is reducible to the single equation 
Lo + 2x3 = 0 
which has two independent solutions: 
In the first case x, cannot vanish, since otherwise the entire vector 
would vanish. Hence we can choose 
u = (l, 0, 0) 
as our first solution. The second solution leaves x, arbitrary, and we 
may choose 
un = (0, —2, 1) 
as our second solution. The other possibility, 
Us — (a, —2, 1) 


108 Matrices and Eigenvalue Problems Chap. II 


with an arbitrary a, is only a linear combination of u, and uz. 
Uz = QU, + Uy 


Hence we see that the multiple eigenvalue A = 1 in our case is 
associated with only two linearly independent eigenvectors. 

Let us now examine the third matrix. Here the equations of the 
principal axes become 


T + 223 = Ty 
A amet 
This system is equivalent to 
rt, = 0 
—Z_ + 32, = 0 
The solution of this system is 
Uy a (1, 0, 0) 


Hence in this case the number of eigenvectors is only one. The three 
generally independent eigenvectors collapsed into one. 

We see that the collapse of eigenvalues can at the same time cause a 
collapse of the associated eigenvectors, a phenomenon which has no 
analogy in the realm of real symmetric matrices. The question can be 
asked whether this peculiar behavior of the eigenvectors could be 
foreseen without going through the explicit solution of the eigenvalue 
problem. This is indeed the case as we can see if we pay our attention 
to the Hamilton-Cayley equation. In the general case we know that 


(A —1,ID\(A — Al) (A —A,1) = 0 (2-11.24) 


This identity makes it possible to express the nth power of A in terms 
of the lower powers. Now, if all the A, are different, there is no 
identity of lower order in existence; i.e., it is not possible that 
already a power of A less than n is reducible to the lower powers. 
But in the case of multiple roots, the situation is different. It is 
possible that a multiple root may occur only once in the Hamilton- 
Cayley equation and not m times, if m is the multiplicity of the root. 

In the case of our three matrices, the eigenvalue 4 = 1 has the 
multiplicity 3. Hence we know in advance that the full Hamilton- 
Cayley equation will be 


(4—I}=0 (2-11.25) 


§ 11 Principal Axis Transformation: Skew-angular Systems 109 


and this equation will certainly be satisfied in all three cases. But it 
may be that already 
(A —IP=0 (2-11.26) 
or even that 
A—I=0 (2-11.27) 


In our first example the given matrix was actually the unit matrix, and 
thus the case (27) is actually realized. Here the matrix identity of 
lowest order is not of third but only of first order. Already A itself 
is reducible to a mere constant. The matrix identity of lowest order 
contains the multiple root not more than once. This is an indication 
that, while we lose uniqueness in the determination of the eigen- 
vectors, we do not lose dimensions. The eigenvectors still include the 
entire n-dimensional space. 

Let us see now what happens in the second case. We form A — I 
and then (A — DÈ: 


012 000]f0 127 ro o o 
A—I=|000| (4—}=l|ı 0 0'0 00/0 o 00 
000 200||0o00]|l000 


Here A — T is not zero but 
(A —I?=0 
The multiple root A = 1 entered the identity of lowest order with the 
multiplicity 2, and this indicates that one of the space dimensions is 
lost. The eigenvectors of A include a two-dimensional space only. 
This is what we have actually found. There were only two linearly 
independent eigenvectors in existence. 
We now come to the examination of the third case. 


To —1 3 
A—I=|0 02 
0 00 
00 0 TS E 00 -27 
(A—I?=| —1 00 lo 0 02  |/=| 00 o0 
3 2 0 0 00 00 0 
r 000 0 —1 3 00 0 
(4—I}=| 0 0 0 0 02 ;=/| 000 |=o 
—2 0 0 0 00 000 


Here the multiple root 2 = 1 entered the matrix identity of lowest 


110 Matrices and Eigenvalue Problems Chap. II 


order with triple multiplicity. Hence we lose two dimensions and the 
eigenvectors of the matrix include a 3 — 2 = 1-dimensional sub- 
space only. Indeed, we have seen that the given matrix did not have 
more than one single eigenvector. (Matrices of this kind whose 
principal axes avoid certain dimensions of space are called “‘defective 
matrices.’’) 

Hence we see that the degree of the defectiveness of a matrix can 
always be established by constructing the identity of lowest order 
which connects the powers of a matrix and examining the multi- 
plicity with which the collapsing roots enter this identity. 


12. The invariance of matrix equations under orthogonal trans- 
formations. The method of coordinate transformations is of pro- 
found importance in the study of matrices. Coordinates have no 
absolute significance and can be discarded in favor of other co- 
ordinates. We can view a matrix from a certain frame of reference, 
but we may also introduce a new frame of reference which may be 
more suited to the nature of the given problem. The question of the 
“most adequate reference system” is therefore frequently of pro- 
found significance. We may start with a given frame of reference 
which does not do justice to the genera] symmetry properties of the 
given problem. We will obtain a better basis for our analysis if we 
abandon the original system in favor of a more adequate system. In 
principle, however, all reference systems are equally admissible. 
Reference systems can be compared with a scaffold in erecting a 
building. The scaffold does not belong to the building; it serves 
merely the purpose of getting access to all possible points of the 
building. But after the building has been constructed, the scaffold can 
be removed. This fundamental difference between scaffold and 
building was the departure point of Einstein’s celebrated “theory of 
relativity.” It is not permissible to intermingle something which 
belongs to the scaffold with something which belongs to the building. 
The nature of the frame of reference has to stand out clearly as an 
auxiliary construction which is part of the description of nature but 
does not belong to the inner essence of nature. In Newtonian 
physics scaffold and building were so strongly cemented together that 
the entire building collapsed if the scaffold was removed. This 
difficulty was finally solved when Einstein’s theory showed how to 
take into account the relative nature of coordinates. 


§12 Matrix Equations under Orthogonal Transformations 111 


Matrix algebra is likewise an example of the operation of the 
principle of relativity. A matrix is associated with a certain space 
structure and the analysis of that space structure demands the use of 
coordinates. These coordinates have no absolute significance and 
can be abandoned in favor of other coordinates. But the inner laws 
expressed by the equations between matrices cannot be affected by 
the accidental frame of reference in which matrices are analyzed. We 
say that the equations of matrix algebra “remain invariant” with 
respect to a transformation of the coordinates. 

We will first restrict ourselves to a rectangular frame of axes. We 
then have a set of base vectors: 


Uy, Ug, = , Un (2-12.1) 
which are mutually orthogonal and whose length is 1. 
uu, =0 (+k) (2-12.2) 
w=] 


We will now introduce a new set of orthogonal axes: 
iy üp =, E (2-12.3) 
by the transformation 


> — > = 
Uy = Uy + Ugg + °° Uj Uy, 


(2-12.4) 
ü, z Uy nhy + Up nilz apes Unnin 
This transformation is characterized by the matrix U. 
Ui U `o Uin 
U= “or Use Uan (2-12.5) 
Uny Ung `° Unn 


The successive columns of this matrix give us the components of the 
first, second, ---, mth new base vectors, analyzed in the original 
system. The fact that the new base vectors again satisfy the ortho- 
gonality conditions (2) finds expression in the matrix equation 


ÜU = I (2-12.6) 


112 Matrices and Eigenvalue Problems Chap. II 
The relation between the original coordinates, 

Al (RE PR I A 
and the new coordinates, «= (č, %, °°: , Z,, 


is expressed in the matrix equation 


Se Uz (2-12.7) 
Let us now consider the matrix equation 
Ax = b (2-12.8) 
In the new reference system the same equation will take the form 
Až = b (2-12.9) 


Now x and b are both vectors and thus they both follow the same 
transformation law (7). 


x= Už, b= Ub (2-12.10) 
Introducing these equations in (8), we obtain 
AU& = Ub (2-12.11) 


We will now premultiply by U and take advantage of the ortho- 
gonality relation (6). Then 


Hence we have to put i 
A= UAU (2-12.12) 


in order to obtain the form (9) of the equation. 

We see that the transformation of a matrix differs from the trans- 
formation of a vector by the appearance of U in front of A and in the 
back of A. If the vector transformation (7) is premultiplied by U, 
we obtain 

= Ux (2-12.13) 
while A= UAU 
In the vector transformation only premultiplication occurs; in the 
matrix transformation pre- and postmultiplication. 

Let us now consider arbitrary combinations of matrices and see 
whether the transformation law (12) will hold again. 


A+ B= UAU+ UBU= U(A+ B)U 


§12 Matrix Equations under Orthogonal Transformations 113 


Hence the sum of matrices follows the same transformation law as a 
single matrix. The same is true of the products of matrices: 


AB = UAUUBU = UABU 


in view of the fact that 7 
UU=T 


We can now say that any combination of matrices obtained by the 
fundamental operations addition and multiplication transforms 
exactly in the same way as one single matrix. This means the follow- 
ing. Let 

F(A, B, = , P) 


be an arbitrary algebraic function of the arbitrary matrices A, B, + , 
P. Then 


F(A, B, +, P) = UF(A, B, = , P)U 
If now we have an arbitrary algebraic equation between matrices: 


F(A, B, ,P)= 0 (2-12.14) 
then also 
F(A, B,---, P) = 0 (2-12.15) 


This shows that matrix equations remain invariant under arbitrary 
orthogonal transformations. 
We can go one step further and include the operation “‘trans- 

position” in our results. If 

A = UAU 
we obtain by the law of transposition: 

A = UAU 
We see that the transpose of a matrix is transformed in the same way 
as the matrix itself. Hence matrix equations which are formed by 


algebraic operations and transposition remain invariant under 
orthogonal transformations. The equation 


F(A, B, —- , P, A, B,--, P)=0 
implies F(A, B,- , P, A, B, ee P) =O 


~ 


For example, if A—A=0O thenalso A—A=—O 


114 Matrices and Eigenvalue Problems Chap. II 


which means that the symmetry of a matrix is preserved under 
orthogonal transformations. Similarly, if 


A=—A then also A=—A 


which means that if a matrix is “antisymmetric” in one frame of 
reference, it remains antisymmetric in all orthogonal frames of 
reference. 

Two fundamental quantities remain unchanged under an ortho- 
gonal transformation. The one is the unit matrix: 


I= ŬIU = ŬU = I 


The other is the “dot product,” or “scalar product” (inner product) 
of two vectors: 


zg = Ux- Üy = y Ur = ya = xy 


13. The invariance of matrix equations under arbitrary linear 
transformations. We will now drop the restriction to rectangular 
coordinates and consider the general case of arbitrarily oriented 
skew-angular frames of reference. The base vectors u, us, * , u,, are 
no longer orthogonal to each other, nor is their length normalized to 
1. Any n vectors which are linearly independent are admitted. We 
have seen that under these circumstances we have to double the 
given set by introducing the adjoint vectors v4, Vas * , Un 


uj, Uo, Hy Un U; Uk — 0 (i = k) 

V1, Ua o U, Uso, = 1 (2-13.1) 

Any vector can now be analyzed in either the one or the other frame 

of reference. Once more we will introduce a linear transformation 

by choosing a new set of base vectors, and their adjoints. 
f Us, ve J Ü; Ür = 0 (i = a k) 


Üj» Vo, cas ve ü; 0; =— 1 


(2-13.2) 


These vectors, if analyzed in the original frame of reference, can be 
characterized as follows: 


— < —> > —> —> 
ü, = Why H F Unn Dy = Vy H F Unn 

; : (2-13.3) 
ar 


= —> —> -> —> ~> 
Un = Uy ply a ipsa a Unnin Un = Uinti apri Unnt n 


§ 13 Matrix Equations: Arbitrary Linear Transformations 115 


These two transformations can be included in the two matrices 


Uy Uso ° Un Vy, Uiz * Uin 
u v V V 

y= | “a aes Va) a: = an | (2-13.4) 
Unt Uno Unn Uni Ung Unn 


The biorthogonality relations between the ū,, 6, find expression in the 
matrix equation 


UV=1 (2-13.5) 
which can be written in the alternate forms 
ÜV = VU= VU=UV=I1 (2-13.6) 


(since a matrix is always commutative with its inverse). 
The transformation of coordinates is expressed by the following 
matrix equations. 


x= Ux 
t= VE (2-13.7) 
Once more we consider the matrix equation 
Ax = b (2-13.8) 


and introduce the transformation of the coordinates 
x= Uz, b= Ub 
We obtain AU& = Ub 
and premultiplying by V we get 
VAUE=b 


which shows that the transformation of the matrix A has to occur 
according to the equation - 
A= VAU (2-13.9) 


On the other hand, if we premultiply the first equation of (7) by V 
we obtain N 
= Vu 


Once more we observe that the transformation of a matrix differs 
from the transformation of a vector by the second factor by which it 
is postmultiplied. 


116 Matrices and Eigenvalue Problems Chap. II 


Since V is the reciprocal of U, we can also put 
A= U TAU (2-13.10) 
Moreover, we could have started our deductions with the “‘adjoint 
equation” 
AE= 8 (2-13.11) 
which expresses the equation (8) in the adjoint frame of reference. 
Then 


£= VE, B=VB (2-13.12) 
leads to 
A= UAV (2-13.13) 
which may also be written in the form 
A=V-1AV (2-13.14) 


Equation (13) is in harmony with (9) and can be obtained by 
transposing the previous equation. 

Once more we can extend the transformation law (9) to the sum of 
matrices and to the product of matrices, and thus to any com- 
bination of the two fundamental operations ‘‘addition’ and 
“multiplication” of matrix algebra. Once more we find that any 
algebraic equation between matrices, 


F(A, B,-,P)=0 (2-13.15) 
which holds in one frame of reference, remains true in any other 
frame of reference 

F(A, B, -= , P)=0 (2-13.16) 
The invariance of matrix equations with respect to coordinate 
transformations is once more demonstrated, but now extended from 
orthogonal to arbitrary linear transformations. 

The only difference compared with the orthogonal case is that the 
relation (15) must not include the operation of transposition. The 
transposed matrix A follows ą transformation law which is different 
from the transformation law of A [cf. (10) versus (14)]. Matrix 
equations which involve the transpose of a matrix do not remain 
invariant under arbitrary coordinate transformations.1 For example, 
the equation 

A=A 

1 The inverse A-* of A transforms in the same way as A itself. Hence 

the equation (15) may include the inverses A~1, B-1,--- , P- (if they exist). 


§ 14 Commutative and Noncommutative Matrices 117 


(which expresses the symmetry of A) is lost under the impact of an 
arbitrary linear transformation, although it is preserved under 
strictly orthogonal transformations. 

The unit matrix J and the scalar product of two vectors are again 
invariants of an arbitrary linear transformation: 


I= UIU = UU = I (2-13.17) 


In the case of a scalar product, it is imperative that the two vectors 
shall be given in adjoint representations: 


= Væ Un = y UVe = ne = æn (2-13.18) 


14. Commutative and noncommutative matrices. Matrix multi- 
plication is generally noncommutative: AB =~ BA. We may have, 
however, two matrices for which the commutative law of ordinary 


multiplication is satisfied: 
AB = BA (2-14.1) 


By the principle of the invariance of matrix equations with respect to 
coordinate transformations, equation (1) must remain valid in every 
frame of reference and must thus express some inherent property of 
the two matrices A and B. What is this property ? 

We have seen that a matrix, by introducing its principal axes as a 
new frame of reference, can be transformed into a purely diagonal 
form. Then in the new frame of reference A = A. Now it may 
happen that a second matrix B possesses the same principal axes— 
although different eigenvalues—and thus becomes likewise diagonal 
in the new reference system. Then 


A=A, B=N 
Now the product of two diagonal matrices is again a diagonal 
matrix, with the diagonal elements: 
AA, 
AA’ = dais 
An An 
These elements are symmetric with respect to the first and the second 


factor, and hence 
AN’ = NA 


118 Matrices and Eigenvalue Problems Chap. I 


Diagonal matrices are always commutative. But then A and B are 
commutative in the new frame of reference, and equation (1) proved 
in that particular frame. Since, however, the independence of that 
equation from any particular frame is proved in advance, we have 
found the necessary and sufficient condition for the commutability of 
two matrices; viz., that their principal axes must be parallel. 

For example, the following two matrices are commutative and 
have therefore the same principal axes: 


33 16 12 —47 —32 —84 
A= | —24 —10 —57], B = 36 24 66 
—8 —4 —17 12 8 22 


The unit matrix is geometrically represented by a sphere. Any axis 
can be chosen as a principal axis. Hence the unit matrix commutes 
with any matrix: 


IA=AI=A 


Moreover, the inverse A~? of the matrix A has the same principal 
axes as A itself. Hence A and A~? are commutative 


AA = AA =I 


Matrices which differ in their principal axes cannot be commutative. 
Matrix multiplication is generally noncommutative because two 
matrices have generally arbitrarily oriented principal axes and 
arbitrary eigenvalues. If the principal axes agree, although the 
eigenvalues are still arbitrary, the multiplication becomes com- 
mutative. 


15. Inversion of a matrix. The Gaussian elimination method. A 
simultaneous system of linear algebraic equations can be written 
down in matrix form as follows: 


Ax = b (2-15.1) 
where A is a given matrix, b is a given vector, and x is the unknown 


vector. By premultiplying by 4A~*, the equation can be formally 
solved and we obtain 


2 = Ab (2-15.2) 


§15 Inversion of a Matrix: Gaussian Elimination Method 119 


The matrix A~ is called the “‘reciprocal’’ or the “‘inverse”’ of A, and 
any method by which A! can be calculated is called the inversion 
of A. 

It is not always necessary, however, actually to invert the matrix 
for solving the linear equation (1). It depends on what the role of 
the right side b is, relative to the given problem. It is possible that 
the right side b is just as much an inherent part of our problem as 
the matrix A. In that case, we are satisfied if we can solve equation 
(1) for that particular right side. Frequently, however, the right side 
of the equation plays a more accidental role. The matrix A is 
inherently associated with the given physical situation, while the 
right side has frequently the significance of a “forcing function.” 
We may want to obtain the response of the given structure to a 
variety of forcing functions. Hence we may want to change the 
right side freely, although the left side of the equation remains 
unchanged. In this case, we actually want the possession of A-1; 
because, having A-1, we let A operate on b, and our solution is 
obtained. If we do not possess A~!, we have to go through a larger 
number of algebraic operations for each separate b. On the other 
hand, the construction of the inverse of A requires much more labor 
than the solution of a set of linear equations for a given right side. 

The fundamental method of inverting a matrix was introduced by 
Gauss and is called the Gaussian elimination method. It works 
equally for solving a set of linear equations and for inverting a 
matrix. Its underlying principle is simple and the operations 
involved are easily accomplished. It belongs to that class of 
numerical procedures which make use of a large number of very 
simple operations, instead of a smaller number of more involved 
operations. What we accomplish is that gradually the zero elements 
of the unit matrix shift over to the left side, while the right side fills 
up more and more with elements. Eventually the unit matrix is on 
the left side and the right side is completely filled up with elements. 
This terminates our task. 

We can analyze the process by applying it first to the case of a 
linear system with a given right side. We write the equation (1) in 
the form 


Ax —b=0 (2-15.3) 


and indicate the system symbolically by writing only the coefficients 


120 Matrices and Eigenvalue Problems Chap. II 


of the system, while the variables appear on the top of the scheme 


tı a + «#, —I1 
MQ, Ap A, db 
Aa Aa aon baj _ 0 (2-15.4) 
An Ang Any b, 


Since any equation can be multiplied by a factor, it is permissible to 
multiply any row by an arbitrary constant. Moreover, it is per- 
missible to multiply any equation by an arbitrary factor and add it to 
another equation. Correspondingly, we can multiply any row of the 
above scheme by an arbitrary factor and add it to any other row. 
These are the two fundamental operations on which the elimination 
method is based. 

First we look for the absolutely largest element of the matrix. Let 
this element be a,,. Then we divide the entire ith row by a,,, with 
the result that the new a,, becomes 1. Our aim is now to make all 
the other elements of the kth column equal to 0. The kth column is 
then composed of zeros and one solitary1. The zeroing of an element 
occurs by multiplying the ith row by that particular element and 
subtracting it from the row which that element occupies. For example, 


Ly Lo Ta —I 


T 33 16 72 359 
—24 —10 —57 —281 |=0 
=8§ £4. 17 85 


The largest element of the matrix is 72, found in the first row. 
Dividing the first row by 72, the new first row becomes 


0.458333 0.222222 1 4.98611 


The 1 appears in the third column. We zero out the elements —57 
and —17 of the same column by multiplying the new row by 57 and 
adding the result to the second row, and similarly by 17 and adding 
the result to the third row. The matrix appears at this stage as follows: 


ty Ta Tg —1 
l 0.458333 0.222222 1 saw 


2.12498 2.66665 0 3.20827 
—0.208339 —0.222226 0 —0.23613 


§ 15 Inversion of a Matrix: Gaussian Elimination Method 121 


The variable x, is now eliminated, since the second and the third 
equations do not contain it any more. The original 3 by 3 problem is 
thus reduced to a 2 by 2 problem. Generally the zeroing of one 
column has the effect that the original n by n problem is reduced to 
an (n — 1) by (n — 1) problem. 

We continue our scheme by hunting for the largest element in the 
remaining 2 by 2 matrix. We find 2.66665. Dividing the second row 
by this element, the new second row becomes 


0.796872 1 O 1.203108 


We zero the second column by multiplying this row by 0.222222 and 
—0.22226 and subtracting it from the first and the third rows. The 
new shape of the matrix becomes 


0.281250 O 1 4.71875 
0.796872 1 O 1.20311 
—0.031253 0 O 0.031232 


Finally we divide the third row by —0.031253 and obtain 
1 0 0 —1.00067 


We zero the first column by multiplying the new third row by 
0.281250 and 0.796872 and subtracting it from the first and second 


rows. This gives 
00 1 5.00019 
0 1 0 2.00052 
_1 0 0 ~1.00067 
The transformed set of equations are thus 


x, — 5.00019 = 0 


Lo — 2.00052 = 0 
A + 1.00067 = 0 
This gives the solution 
x, = — 1.00067 [correct: —1] 
£a = 2.00052 [correct: 2] 
x = 5.00019 [correct: 5] 


Quite similar is the procedure of inverting the matrix. The only 
difference is that the right side is now taken by the unit matrix J. 
Hence we have to operate with n columns instead of one column. 


122 Matrices and Eigenvalue Problems Chap. II 


But the successive operations are identical with those of the previous 


algorithm. 
| 33 16 72 10 0 
—24 —10 —57 0 1 0 
—§ —4 —17 00 1 


First transformation: 


—0.45833 0.222222 1 0.0138888 0 0 
2.12498 2.66665 0 0.791662 1 0 
—0.208339 —0.222226 © 0.236110 0 1 


Second transformation: 


0.281250 0 1 0.052083 —0.083333 0 
0.796872 1 0 0.296875 0.375002 0 
—0.031253 0 0 0.302083 0.083335 1_ 


Third transformation: 


f 0 1 2.66640 0.66661 Bon 


0 1 0 7.9992 2.49983 25.4974 
1 0 0 —9.66572 —2.66646 —31.9969 


The position of the 1 in the first part of the final matrix indicates that 
the rows of the final result have to be rearranged in the sequence 
3, 2, 1. The inverse matrix thus becomes 


—9.66572 —2.66646 —31.9969 
AS 7.9992 2.49983 25.4974 
2.66640 0.66661 8.9991 


The correct inverse, given to five decimal places, is 


—9,66667 —2.66667 —32.00000 
At 8.00000 2.50000 25.50000 
2.66667 0.66667 9.00000 


The discrepancies are caused by rounding errors. How quickly the 
rounding errors can accumulate is demonstrated even by this simple 
3 by 3 example. Although the calculations were made with an 
accuracy of 10-6, the results have an accuracy of only 1074. In large 
matrices the accumulation of rounding errors can become quite 
serious. 


§ 16 Successive Orthoganalization of a Matrix 123 


16. Successive orthogonalization of a matrix. The numerical 
aspects of inverting a matrix are distinctly different from the purely 
mathematical aspects. Our calculations are always of only limited 
accuracy. The rounding errors constantly accumulate, and con- 
sidering the large number of operations involved in the inversion 
process, they may eventually obliterate the desired results. The 
Gaussian elimination scheme insures proper results if the effect of 
rounding errors can be neglected. But when inverting large matrices 
this is almost never the case. The numerical inversion of such 
matrices is of paramount interest, considering the fact that so many 
problems of contemporary physics and engineering lead to the 
solution of large linear systems. What procedure shall we follow 
under these circumstances ? 

The establishment of electronic digital computers started a new 
chapter in the history of numerical analysis. The extraordinary 
rapidity with which these machines perform the fundamental 
operations of arithmetic leads to a shift in our general philosophy of 
numerical operations. The emphasis is no longer on procedures 
which obtain a result in the smallest number of operations. More 
important is the viewpoint of simple codibility, together with the 
demand on high accuracy. In the problem of inverting a matrix we 
will be interested in a procedure which, in spite of the limited 
accuracy of arithmetical operations, cannot come to grief, no matter 
how extensive the matrix is to which it is applied. 

In discussing the question of accuracy we must realize that under 
no circumstances can we expect absolute accuracy. Nor is the 
question of accuracy a matter of arbitrary decisions. Rarely are the 
elements of the given matrix A absolute mathematical numbers. In 
most cases the elements a,, of the matrix are obtained on the basis of 
some measurements, which are automatically of limited accuracy 
only. Let us now assume that as the result of some inversion 
process we have obtained an approximate inverse of A. This A- is 
certainly the exact inverse of a certain A which differs from A by the 
small amount «A;: 


A= A + «A, 


Let it be true that all the elements of «A, are smaller than the 
possible errors of a,,. Under these circumstances we can describe 4 
as being “numerically equivalent” to A, and the obtained A~ can be 


124 Matrices and Eigenvalue Problems Chap. II 


accepted as the correct answer to our problem. It would be entirely 
superfluous to try to correct A~ in order to arrive at the correct 
A~. The mathematically correct answer has no significance, in view 
of the limited accuracy of the given A. In the case of a ‘“‘mathe- 
matical matrix” A, the substitution of a “numerically equivalent” 4 
loses, of course, all significance. But even then it will be advisable to 
replace the exact elements of A by elements limited to a definite 
number of decimal places, carry through the inversion process and, 
finally, if necessary, correct £~! by a perturbation process (cf. § 23). 

The inversion method described in this section has the advantage 
that we can keep the rounding errors under control, no matter how 
far the process continues. The accuracy of our calculations is deter- 
mined by the accuracy of the elements of A and can be adjusted, if 
the need arises, even to variable accuracy. We shall first discuss the 
mathematical aspects of the method, and then come to the question 
of the rounding errors. 

We once more consider the equation 


Ax = b (2-16.1) 


but interpret it in a slightly different manner. Writing out these 
equations in detail, we have 


Auty F aata + antn = by 
anti + azta + + azn®n = bo (2-16.2) 
anty + anto H H annin = Dy 


We will now consider each one of the columns of the matrix an 
independent vector, and do the same with the right side of the set. 
Hence we put 
Uy = (ar an, s am) 
Uz = (d2 A22, *** » Ano) 
(2-16.3) 
Un = (ain Ans `™ >» Ann) 


b = (b; bg,» , by) 


Our problem can now be interpreted as follows. Find a linear 
combination of the vectors u,, us, °°, u, such that the resultant 
vector shall become the given vector b. 


HU, + Lag + + Enun = D (2-16.4) 


§ 16 Successive Orthogonalization of a Matrix 125 


In this interpretation we have a given skew-angular reference 
system, characterized by the base vectors t4, us, * , Up, and we want 
to analyze the given vector b in this reference system. Hence 
(x1, X_, °°, Z,) are the skew-angular components of the vector b in 
the given set of base vectors. The condition that these base vectors 
are linearly independent, is equivalent to the condition that the 
determinant of the given linear set is different from zero. 


4 Ge ° Gn b/ 

a a a 

21 O32 an |0 focus (2-16.5) 
$ Uy 

Ani ano =< Ann 


In discussing the operation with skew-angular reference systems 
we have seen that a skew-angular set of base vectors demands con- 
struction of the adjoint set v, va, ** , On. But this is equivalent to 
construction of the inverse matrix, since the rows of the inverse 
matrix represent the vectors v,, Vas” , Un 

In-one particular case the problem is easily solvable. It may 
happen that the base yectors are orthogonal to each other and their 
length equals 1: 


In this case the system is self-adjoint, and we obtain 
Vi == u; (2-16.7) 


The inverse matrix is then simply the transpose of the original 
matrix. N 
A1=A (2-16.8) 


Orthogonal reference systems are thus of particular advantage 
in dealing with linear systems. It will be our aim to introduce an 
auxiliary reference system which has the property of being ortho- 
gonal. We solve the problem in the new reference system and then 
return to the original frame of reference. 

We choose the first vector u, as the first vector of the new system. 
However, we normalize its length to 1; i.e., we divide by the length of 
the vector: 

w = 


(2-16.9) 


al 


126 Matrices and Eigenvalue Problems Chap. H 


Our next vector w, is chosen in the plane of the first two vectors u 
and u,. In this plane we find a vector which is orthogonal to w; and 
whose length is 1. We call this vector wa. Except for a + sign, this 
vector is uniquely determined. Next we go into the space included by 
the first three vectors u,, ua, and uz. In this space we find a particular 
vector w; (except its orientation up or down) which is orthogonal to 
both vectors w, and w,, and whose length is 1. Next we go into the 
space included by the first four vectors uj, Us, Uz, u, and find a 
vector w of the length 1 which is orthogonal to the three vectors 
Wi» Wa, Wz. This construction can be continued until finally the space 
of all the vectors u,, us, °°, u, is exhausted. We have thus con- 
structed a system of n mutually orthogonal vectors w,, Wa, *' , Wn of 
the length 1. 

Generally each new vector is obtained in two consecutive steps. 
First we construct 


Ww; = U; — PiW1 — PiWe — °° — PiWi (2-16.10) 


where the coefficients p,, are available since the dot-product of w; 
with the previous vectors w, gives 


Pin = UWp (2-16.11) 


Then we normalize the length of w; to 1: 


w; = — (2-16.12) 
i | w | 
We will write 
Pii = | w; | (2-16.13) 
and obtain 
Uy = PaWi + PiWe + °° + PiWi (2-16.14) 


The elements p,,,(k < i), can be included in a “triangular matrix,” 
i.e., a matrix which has elements in the diagonal and below the 
diagonal, while all the elements above the diagonal are put equal 
to zero. 
The equation (14) can now be written in the form of a matrix 
equation: 
A= WP (2-16.15) 


1 Division by zero cannot occur because w; cannot vanish in all its com- 
ponents if the determinant of A is not zero. 


§ 16 Successive Orthogonalization of a Matrix 127 


where W is a matrix whose columns are in succession the vectors 
Wis Wo, *** > W, By construction W is an orthogonal matrix: 


Ww=I (2-16.16) 
and thus 
w= W (2-16.17) 


Moreover, a triangular matrix can be inverted by successive 
eliminations (cf. § 17), thus giving the new triangular matrix 


Q= P-1 (2-16.18) 
But then we have, in consequence of (15): 
A`% = OW (2-16.19) 


and the problem of inverting A is accomplished. 

In actual fact the limited accuracy of our arithmetical operations 
demand a slight modification of the method. If we proceed in the 
previous fashion, not paying attention to the rounding errors, we 
will gradually lose in the orthogonality of the vectors w,. They 
would remain automatically orthogonal to each other if we could 
count on the exact orthogonality of the previous vectors w, and the 
exactness of the coefficients p,,. But neither of these conditions is 
fulfilled and the result is that the vectors w, become gradually 
deorthogonalized. Hence we cannot count on the accuracy of (17), 
and the end result (19) becomes more and more unreliable as the 
order of A increases. 

In view of this difficulty we shall now define the vector (10) by a 
process which does not rely on the exact orthogonality of the 
previous vectors w, but insures the orthogonality of the new w, to all 
the previous w, even if these w, are not orthogonal. For this purpose 
we determine the p,, of equation (10) by the least-square principle 
that the square of the length of w; shall be minimized: 


(u; — Pawi — °° — Pis-aWi-w” = minimum 
This gives the following linear set of equations: 


Pav; fet + PaiWiWi-1 = UW 
: (2-16.20) 
PaWi-1 + °°" + Pit = UyWi-1 


128 Matrices and Eigenvalue Problems Chap. II 


We need not solve these equations with absolute accuracy. We can 
count on the approximate orthogonality of the w, vectors. As soon 
as a new w, is obtained, we immediately dot it with the previous 
w,, thus testing its orthogonality. In view of the rounding errors 
we do not get zero, but the components of a symmetric matrix: 


WWy = Eik (k= 1,2,- i— 1) (2-16.21) 
w? = ] + 6, 


These £; however, are small, and thus it suffices to obtain the 
solution of the system (20) in two steps. First we evaluate the 
preliminary quantities (11): 


Pa = uw, (k<i) (2-16.22) 
and then we correct them according to the following scheme: 
Pik = Pir = (EnPin St a ep i-1Pi, i —1) (2-16.23) 


This correction scheme is the only change compared with the 
previous procedure. Having obtained p,, we again generate the 
vector w; according to (10) and normalize its length according 
to (12). Also p,,; is again defined by (13), without further correction. 

The orthogonality of the vector system w, within the chosen 
number of decimal places is now insured by the construction of each 
new vector. It is important, however, that in the evaluation of the 
Pix We shall keep a sufficient number of decimal places. In highly 
skew-angular systems the length of w; may becomevery small because 
we may lose several decimal places in the construction of w;. We 
have to make up for the loss of significant figures by adding a 
corresponding number of significant figures in writing down the p;x, 
without demanding, however, an increased accuracy of the previously 
obtained w,. Hence it cannot happen that the entire calculation has 
to be repeated with increased accuracy, because of a surprise 
encountered during the course of our calculations. The worst that 
can happen (excepting systems which are so nearly singular that an 
exact solution of (20) is demanded, in which case the system is void 
of any physical significance) is that the quantities (22) and (23) may be 
demanded with maximum accuracy. In the anticipation of this 
possibility it is advisable to obtain and store the e,, with full accuracy. 


§ 16 Successive Orthogonalization of a Matrix 129 


The number of decimal places to which the vectors w, may be 
truncated can be decided as follows. We interpret the truncation of 
the vector w;, determined according to (10) and (12), as a modifi- 
cation of the vector u,, which is the ith column of the given matrix. If 
w; is truncated to u decimal places, the rounding errors in any com- 
ponent cannot exceed }-10-“. The length of w; cannot exceed the 
length of u;. Consequently the vector #,, which takes the place of 
u; in the numerical inversion process, cannot differ from u, by an 
amount which is more than 0.5- 10“ times the length of u,. This 
shows that if the elements of the ith column of A, after dividing by 
the length of that column, can be guaranteed to u decimal places, it 
suffices to truncate w, to u decimal places. We have then succeeded 
in replacing the given matrix A by another, but numerically equi- 
valent, matrix A, which is split into the product of two matrices, 
according to (15). Now it is true that the matrix W is not exactly 
orthogonal, and thus its inverse is not exactly W. Since, however, we 
possess the matrix e [cf. (21)], we can correct W according to 


W= Ŵ— eW (2-16.24) 


The matrix P is exactly triangular and its inverse is obtainable to any 
degree of accuracy, according to the numerical scheme of § 17. 

In actual fact it will seldom be necessary to apply the correction 
(24) to the simple inverse W, nor will it be necessary to invert P with 
excessive accuracy. The numerical certainty of A~? is usually much 
smaller than the numerical certainty of A itself. Let the true matrix 
Ay be equal to the given truncated matrix A plus a small but unknown 
correction: 


Ao = — aA, 


We will assume that the columns of A have been normalized to the 
length 1 so that the elements of A are all between +1. Now 


Ay) = At + «AAA! 


The relative error of Ag} on account of the presence of A, would be 
of the same order of magnitude as «A, itself, were it not for the post- 
multiplication by A. Since the average order of the inverted 
matrix is usually much larger than 1, this multiplication will cause 
a considerable increase in the uncertainty of A—'. Hence it is generally 
not necessary to ascertain the inverse of a matrix to the same number 


130 Matrices and Eigenvalue Problems Chap. II 


of significant figures to which the original matrix was given. This 
is not so, however, if A happens to be nearly orthogonal. But even 
then the relative accuracy of A~ cannot be greater than that of the 
original matrix. 


17. Inversion of a triangular matrix. A triangular matrix can be 
inverted by a simple numerical algorithm. Algebraically we proceed 
as follows. The equations 


Uy = PW 
Uz = PoW1 + PoaWe (2-17.1) 


Un = Pn + PnoWe T Éi + PnnWn 
have the property that every new equation brings in but one new 
unknown. Hence the first equation can be solved for w,, by dividing 
by pı. Then we solve the second equation for w,, substituting for w, 
the previously obtained value. Then the third equation is solved for 
W, Substituting for w; and w their values. And so we proceed, until 
we arrive at w,,, which becomes a linear combination of u}, Ug, -*- , Un: 


Wy = qu 


We = qat + Jools (2-17.2) 


Wn = ni a5 d ngs TF Fannin 
We see that the inverse of a triangular matrix is once more a 
triangular matrix. 

In a numerical algorithm the operation with symbols has to be 
abandoned and a scheme developed which operates with numbers 
only. Our ordinary arithmetical operations—such as longhand 
division for example—are based on complicated algebraic operations, 
but in the actual numerical algorithm the algebraic operations are 
eliminated and replaced by purely numerical processes. Often we are 
not even aware what algebraic process underlies a certain numerical 
algorithm. 

The numerical inversion of a triangular matrix proceeds as follows. 
We write down the given triangular matrix P in transposed form. 
Hence the upper and not the lower triangle is now filled up with 
elements. In the lower triangle we are going to put the elements of 
the inverse matrix Q = P~. The dividing line between the two 
matrices is the diagonal. Here we will immediately write down the q;; 


§ 17 Inversion of a Triangular Matrix 131 


as the reciprocals of the p, and not as the original p, The final 
construction will thus look as follows. 


qu Pa Pa | Pm 
qa Jon P32 Pre 
Yai J32 33 Prs 


Ini n2 Gna “ nn 


Now an arbitrary q,, (i > k) can be constructed as follows. Two 
rows are involved in the construction; viz., the row i and the row k. 
The row i involves the elements of the Jower triangle, the row k the 
elements of the upper triangle. For example, the element q; will be 
constructed as follows. We will use the fifth row below and the 
third row above. We start from the underlined pivotal element q;5 
and go backward. We multiply the third row and the fifth row 
element by element, but with the understanding that we stop at a 
point where the pivotal element of the upper row is reached. Hence 
in the case of q; only two terms are left: 


Qs5Ps3 + Y54Pas 
because the next product would involve q which we do not have. 
The resulting sum is multiplied by pz, (the upper pivotal element), and 
the final answer is reversed in sign. 


953 = —(s5Ps3 + qs4P43)933 
We show the operation of this algorithm with the help of a simple 
example. Let us invert the following triangular matrix: 


2 

3 5 
=| © —7 10 

6 1 81 


Numerical construction of the inverse matrix: 
0.5 3 0 
—0.3 0.2 —7 

—0.21 0.14 0.1 

—1.02 —1.32 —0.8 


| m CO = © 


132 Matrices and Eigenvalue Problems Chap. II 


Element 21: Multiply 0.2 - 3 and still multiply by 0.5, changing the 
sign to minus: —0.2-3-0.5 = —0.3 


Element 42: Multiply 1 - 1 + (—0.8)(—7) = 6.6 and then multiply 
by 0.2, changing the sign: —6.6-0.2 = —1.32 


and so on. 
We have thus obtained our Q matrix: 
0.5 
Q = —0.3 0.2 
| —0.21 0.14 0.1 


—2.532 —0.312 —0.3 1 


We check our calculations by forming the product PQ which has to 
come out as the unit matrix J. In order to multiply column by 
column, we transpose the first factor: 


23 06 0.5 0 0 0 100 0 
05 —7 1| | —03 02 00] lo100 
o o 10 8|°| —021 014 01 0o 10010 
00 0 o| | —1.02 —1.32 —0.8 1 0001 


The construction of the inverse of the given triangular matrix is thus 
completed. 


18. Numerical example for the successive orthogonalization of a 
matrix. The following 5 by 5 matrix was obtained as the result of 
an industrial least square problem. Matrices of this kind are always 
symmetric and they have a further property: they are “positive 
definite.” This means that the quadratic form associated with the 
matrix: 2a;,%,2,, cannot become zero or negative for any choice of 
the variables; (except the trivial case that all the x, vanish). 

Such matrices can be properly normalized by the following device. 
We transform the variables x, by changing their scale: 


x= Vain, (2-18.1) 
and we divide the ith equation by V'a. This amounts to the 
following transformation of the elements: 


a (2-18.2) 


Po o dik 
k = oS 


The result of this transformation is that all the diagonal elements of 


§18 Successive Orthogonalization of a Matrix: Example 133 


the matrix become 1. Moreover, all the nondiagonal elements must 
lie between +1. In our problem the initial normalization (2) 
has been performed in advance and we will start immediately with the 
normalized matrix. The elements of this matrix are given, in harmony 
with the accuracy of the observations fiom which they were derived, 
to five decimal places: 


0.99877 0.99297 0.98341 0.99580 
1 0.99325 0.98044 0.99363 
0.99325 1 0.98233 0.98776 
0.98044 0.98233 1 0.98749 
0.99363 0.98749 1 


The fact that all the elements are so near to 1 indicates that the system 
is very skew-angular. 

In order to ensure that the “equivalent matrix”? does not deviate 
from the given matrix by. more than 4 unit of the fifth decimal place, 
we will keep six decimal places in the formation of the matrix W. 
The evaluation of the p,, occurred with an accuracy of nine decimal 
places because the calculation of the w; showed that two decimal 
places were lost in the process of forming the differences (16.10). The 
corrections (16.23) were thus constantly demanded. The fol'owing 
table lists in its columns the vectors w, as they emerged in succession 
during the orthogonalization and normalization process (decimal 
point in front): 


— 074358 


449819 — 321824 —476554 — 679308 
449266 636326 — 372075 — 502115 051862 
W= | 446657 360073 744406 — 317396 126348 
442357 — 582393 297446 582606 — 192965 
447930 — 347456 — 339669 — 283912 694732 


The column-by-column product of this matrix by itself—obtained 
gradually but here shown in its final appearance—demonstrated that 
the deviations from orthogonality never reached 10-8. The following 
table gives this product matrix (the e-matrix of the text), multiplied 
by 10": 


547515 — 464906 338988 148894 292931 

— 464906 —265846 — 742034 — 026372 290133 

1012€ = 338988 — 742034 — 062086 518499 —317668 
148874 —026372 518499 — 816063 —035420 

292931 290133 — 317668 — 035420 886061 


134 Matrices and Eigenvalue Problems Chap. II 


The elements of the P-matrix emerged from row to row. Apart from 
the diagonal elements—which were obtained by taking the square 
root of the sum of the squares of the elements of the w; vectors—every 
other element is listed as the sum of two elements, viz., the pre- 
liminary p,,, obtained according to (16.22) and the correction, 
contained in the second term of (16.23), and listed directly below the 
corresponding p;;: 


Pu = 2.223116 


corr.:— 1216 


Pex = 2.220953755 0.003458902 


2.216535115 0.002963282 
corr.: — 1212 +1031 


Psx = 2.216533903 0.002964313 0.01195886 


2.206282826 0.021036753 0.011995689 
corr. :— 1222 +1029 — 763 


Pax = 2.206281604 —0.021035724  0.011995689 0.01410346 


2.220276968 -—0.008670650 —0.000826800 0.002258580 
corr. :— 1220 +1029 —760 — 328 


Psk = 2.220275748 —0.008669621 —0.000827560 0.002258252 0.004058572 


An interesting conclusion can be drawn concerning the deter- 
minant of the given matrix A. The equation (16.15) shows that the 
determinant of A is equal to the product of the determinants of 
W and P. But the determinant of the orthogonal matrix W can only 
be +1, the sign being decided by taking the product of the signs 
of the diagonal elements. In our case all the diagonal terms of W 
are positive and thus the sign is plus. The determinant of P, being 
a diagonal matrix, is simply the product of the diagonal terms. This 
product is in our problem A = 5.26367 -10-*, thus showing the 
extraordinarily skew-angular character of the given linear system. 

We now come to the inversion of the P-matrix, according to the 
numerical scheme of §17. This yields the Q-matrix: 


0.449819082 
— 288.827893 289.109087 
—11.7789627  —71.6631707 83.6200106 
— 491.144346 492.1677422 —71.1229472 70.9045865 
— 592.171546 329.1112848 56.6243775 —39.4524044 246.392080 


(The unmarked elements are all zero.) Finally (16.19) shows that the 


§ 19 Triangularization of a Matrix 135 


column by row product of the Q matrix by the W matrix gives the 
desired inverse A7?: 


661.793 —456.547 —31.482 —6.990 —167.376 

— 456.526 474.825 — 63.888 33.556 12.778 

A-1 = | —31.499 —63.878 91.976 —27.490 31.131 
—6.970 33.542 —27.491 48.922 —47.545 

— 167.401 12.800 31.128 —47.539 171.176 


The exact inverse of A should be a symmetric matrix. Our inverse 
shows slight deviations from. symmetry. It is unnecessary, however, 
to try to eliminate these deviations since we know that our A~! is the 
exact inverse of a matrix which differs from the given matrix by less 
than 4 unit of the fifth decimal place which is outside the reach of our 
measurements. Any conclusions involving quantities of such small- 
ness would be automatically spurious. That the inaccuracies occur 
already in the second decimal, is not the fault of the inversion 
process but a consequence of the strongly skew-angular character of 
A. The columns of A had a relative accuracy of about 1 - 10-®. In the 
columns of A~ we cannot expect a relative accuracy exceeding 
5- 10-4. In actual fact the deviations from symmetry are consider- 
ably smaller than the permissible maximum errors. 


19. Triangularization of a matrix. The orthogonalization process 
which we have discussed in §16 has still another aspect. If we 
transpose (16.15) and premultiply the old equation by the new one, 
we obtain _ 

AA = PP (2-19.1) 


In this equation the orthogonal matrix W has dropped out completely. 
This shows that we can obtain the triangular matrix P even without 
explicitly generating the orthogonal vectors w;. While this procedure 
is much shorter than the previous one, it has the disadvantage that we 
cannot keep the rounding errors so simply under control as in the 
previous case. However, the method has its advantages if A is not 
too skew-angular and we can operate with a sufficient number of 
extra decimal places to keep the rounding errors below the danger 
point. 

The equation (1), written out in its elements, is solvable by going 
from row to row and in each row systematically from the left to the 
right, until the diagonal element is reached. Every equation brings 


136 Matrices and Eigenvalue Problems Chap. II 


in one new unknown which can thus be eliminated. Finally the 
entire P matrix is constructed. Then again we invert P and obtain 
the new triangular matrix Q. Now the equation (16.15) shows that 
the matrix W can be eliminated by postmultiplying by Q: 


W= AQ (2-19.2) 
Substitution in (16.19) gives 
A= QQA (2-19.3) 


The matrix Á A is automatically a symmetric and positive definite 
matrix. We can conceive the equation (1) as an expression of the 
fact that any symmetric and positive definite matrix can be split into 
the row-by-row product of a triangular matrix by itself. Now it so 
happens that in many problems of applied mathematics the given 
matrix Á is already in itself a symmetric and positive definite matrix. 
In this case it is not necessary to premultiply by A, but we can put 
directly 


A= PP (2-19.4) 


Not only do we save now the column-by-column multiplication of 
the given matrix by itself—in the case of large matrices an elaborate 
operation—but we have the further great advantage that the matrix P 
will be much less skew-angular than the original matrix has been. 
The product of the diagonal elements of P is now only the square root 
of the original determinant. Hence the loss in significant figures is 
now much less pronounced than before. This reduction of a 
symmetric matrix to a triangular matrix is thus strongly advocated if 
the matrix is the result of a least-square problem, provided that we 
have enough decimal places at our disposal to counteract the 
accumulation of rounding errors. If the given matrix A is not 
excessively skew-angular, we will succeed with the triangularization 
without constant corrections. Finally we invert P and obtain A~ in 
the form 


A= QO (2-19.5) 


If A is not symmetric, the preliminary symmetrization according to 
equation (1)—premultiplication by A—cannot be avoided. Even 
then the scheme is remarkably simple, compared with the more 
elaborate scheme of § 18. However, the complete scheme is always 


§ 20 Inversion of a Complex Matrix 137 


safe and will always be preferable if the given matrix is strongly skew- 
angular, or if its order is so high that the extra number of decimal 
places is still unable to counteract the accumulation of rounding 
errors. 


20. Inversion of a complex matrix. Occasionally linear systems 
with complex elements have to be solved, and the question arises of 
how to invert a matrix whose elements are complex numbers. In 
principle this problem needs no special attention, since complex 
numbers satisfy all the rules of ordinary algebra, and all operations 
with real numbers carry over to the realm of complex numbers with 
the added information that i? = —1. However, the numerical 
operations with complex numbers are often prone to errors and we 
often prefer to translate everything into the realm of real numbers. 
We can do that with complex matrices, at the cost of increasing the 
size of a matrix of n rows and columns to a matrix of 2” rows and 
columns. The amount of work in inverting such a matrix is almost 
8 times as great. However, the algebra of complex numbers shows 
that this factor is reduced to only 4; this corresponds to the fact that 
multiplication of two complex numbers amounts to 4 times the work 
of a simple multiplication. We gain greatly, however, in the sim- 
plicity of the resulting work scheme. 

Let C be a complex matrix which can be split into a real and an 
imaginary part. 


C=—A-+iB (2-20.1) 
Consider the solution of the equation 
Cz= Cc (2-20.2) 
Both vectors z and c are complex. We will write them in the form 
z = x + iy, c=a- ib (2-20.3) 
Then the algebraic equation 
(A + iB)(x + iy) = a + ib (2-20.4) 
splits into the two real equations 
Ax — By = a, Bu + Ay = b (2-20.5) 


The components of the vectors x and y can be combined into one 


138 Matrices and Eigenvalue Problems Chap. Il 


vector of 2n components. The same can be done with the two 
vectors a and b: 


t= (£i, Tas t a, Ens Yis Yoo *** 5 Yn) 
a= (a, Qa, ° 5 An, bi» bz, ta b,) 


Our linear system can now be written in the form 


Ci=a (2-20.6) 
Where the 2n by 2n matrix Č is defined as 
= A —B | 

om| 4 A (2-20.7) 


We can avoid, however, the inversion of this large matrix, if we are in 
the possession of the inverse of A, i.e., A-1. We premultiply the first 
equation of (5) by A~ and thus liberate zx. 


x — A By = Ata (2-20.8) 
Hence x is now expressible in terms of y. 
x = ABy + Ata (2-20.9) 
Substituting in the second equation, we obtain 
(A + BA“B)y = b — BA“a (2-20.10) 


We will now invert the n by n matrix 
A= A + BAB 
Let the inverse of this matrix be 4: 
A, = (A + BAB)" 


After the proper simplification the final result of our inversion 
scheme can be expressed as follows. The inverse C~ of the complex 
matrix C = A + iB may be written as 


C= A, — iB, (2-20.11) 
where 
Bı = A,BA = ABA, (2-20.12) 


21. Solution of codiagonal systems. In many problems of analysis 
a special type of equation occurs as the result of changing a given 
linear differential equation to a difference equation. The matrix of 


§ 21 Solution of Codiagonal Systems 139 


the resulting linear system has many zeros. The only nonzero 
elements of the matrix are in the diagonal and a few “codiagonals”’ 
(these are lines parallel to the main diagonal, on both sides of the 
main diagonal). Hence the nonvanishing elements of the matrix are 
contained in a relatively narrow band around the main diagonal. 
Such systems can be solved without going through the regular inver- 
sion technique. If the number of codiagonals occupied by elements is 
small, the original large set of equations is reducible to a small set of 
equations. The remaining simultaneous set contains only as many 
unknowns as there are diagonals occupied by elements. 

The accompanying figure demonstrates the situation graphically, 
before going into analytical details. A system with three codiagonals 
is pictured. The codiagonals below the main diagonal are not 
indicated specifically, since for the general procedure it makes no 
difference whether the matrix below the main diagonal is solid (i.e., 
completely occupied by elements) or composed of a few codiagonals 
and the rest of the elements vanishing. 


We reduce the original matrix of n rows and n columns to the 
heavily drawn matrix of n — m rows and n — m columns which is 
purely triangular. Since we know how to invert a triangular matrix, 
the reduced problem can be solved at once. 

The original operation has the form 


Ax = b (2-21.1) 


140 Matrices and Eigenvalue Problems Chap. II 


We will agree that the notation A shall refer to the new triangular 
matrix, obtained by omitting the first m columns and the last m rows. 
Similarly, z shall denote the vector x = (a, za, * , x,) but omitting 
the first m elements. 

E= (Emy Empa ™ > En) (2-21.2) 
The omission of the last m equations is permissible since it merely 
means that for the time being we will not make use of the information 
contained in the last m equations. Later we will satisfy the remaining 
equations too. The first m columns cannot be omitted, but it is 
permissible to carry over these columns to the right side if the 
original columns of A are denoted by 1, Uz, * ,u,. Similarly the 
notation b will indicate that the last m elements of the vector b have 
been omitted. Hence the given system, disregarding the last m 
equations, may be written 


AE = b — (xü + tü, + | + Lptin) (2-21.3) 


In the figure we have chosen m = 3. 
Now A is a triangular matrix P whose inverse Q can be constructed 
according to the algorithm described in Section 17. We thus obtain 


t= Qb — xı Qü an To Qü R Lm QU, (2-21.4) 
This equation expresses 7,,,1,Um42, °" s&n in terms of the first m 
unknowns 2), £a, 5 Em 

We now come to the solution of the last m equations. This is now 
an easy task, since all the x, beyond k = m are expressed in terms of 
the first m unknowns. Hence we have a simultaneous set of m linear 
equations in m unknowns, which causes no particular difficulties, 
since m is small. The originally large set of n equations is reduced to 
the much smaller set of only m equations. Finally, after obtaining 
the x,, £z, ** , £m by solving the reduced set, we substitute these values 
in (4) and thus obtain the complete solution of our problems. 

The inversion of a triangular matrix P demands that none of the 
diagonal elements of P shall vanish. If some of these elements vanish, 
the method is still applicable with the following modification. We 
replace the zero by 1 and put a compensating term on the right side 
of the equation. Let us assume, for example, that in the example of 
the figure the third codiagonal has a zero in the fourth and fifth 
equations. These zeros are replaced by 1 and correspondingly z, is 
put on the right side of the fourth, and x, on the right side of the 


§ 22 Matrix Inversion by Partitioning 141 


fifth equation. Inverting the system we now obtain an apparent 
redundancy, since the x, are expressed not only in terms of 2, £z, 23, 
but also in terms of x, and zx, while the last 3 equations cannot 
determine more than 3 unknowns. In actual fact, however, the 
fourth and the fifth equations give us two additional conditions which 
are not yet satisfied. And thus we obtain 3 + 2 = 5 equations for 
3 + 2 = 5 unknowns which can be solved, provided that the system 
is not singular. 


22. Matrix inversion by partitioning. The direct inversion of a 
large matrix is frequently not advisable, because of the eventual 
accumulation of rounding errors. Hence it is of importance to know 
that inversion of large matrices can be avoided by partitioning the 
matrix into smaller components. Consider a system of 20 equations 
with 20 unknowns, requiring the inversion of a 20 by 20 matrix. We 
might have coded the inversion of a 10 by 10 matrix. Then we can 
obtain the solution of our larger problem by two inversions of the 
10 by 10 type. Generally this method of reducing a large matrix to a 
combination of smaller matrices can be characterized as follows. 
Consider the matrix equation. 


Az=c (2-22.1) 


Let us partition the given large matrix A mı Mo 
into four smaller parts according to the 

following scheme. The partitioning divides 

our original matrix A into four parts, viz., my 

the two square matrices A, and B, and the 

two nonsquare matrices B, and A,. We 


partition also the unknown vector z and m, ) Az | Bg 
the given right side c. 
L= (Hy, Los , Lm) 
Z = (8p Fy, °° Em) (Yrs Yo Ym)? Y= Yas Yas > Ym) 
a = (d; az}, am, 
C= (Ay, Aa, Am), (bi, bos t Dm,); b = (bis bos t 5 Bing) 
Now the first m, equations of the given system may be written 


Ax + By=a 


142 Matrices and Eigenvalue Problems Chap. II 
The last m, = n — m, equations give 
Ax + By = b 


and thus the entire system of equations is now equivalent to the two 
simultaneous vector equations 


Ax + By = a, Ax + By = b (2-22.2) 


Let us now assume that we are able to invert the matrix 4, obtaining 
A;*. We premultiply the first equation of (2) by A; and thus 
liberate x. 


x = Ayla — A; By (2-22.3) 
Substitution in the second equation gives 
(B, — A,Ay'B)y = b — A, Aza (2-22.4) 


We encounter here the product of matrices which are not of the 
Square type; that is, the numbers of rows and columns are not 
necessarily the same. Actually the law of multiplying two matrices 
together does not necessitate equality of the numbers of rows and 
columns. Since the rows of the first matrix are multiplied by the 
columns of the second matrix, it is necessary that the length of the 
rows of the first factor shall correspond to the length of the columns 
of the second factor. But this is the only condition, and generally we 
can multiply a matrix of n rows and / columns with a matrix of / rows 
and m columns. The product will be a matrix of n rows and m 

columns, as indicated in the figure below. 
Now A, is a matrix of m, rows 
m and m, columns, A; + is a matrix 
of m, rows and m, columns. 
Hence (m2,1,)(m,,1m,) = (mam) 
and thus the product A,A;" is 
an m, by m, matrix. Multiplying 
by B,, which is an m, by m 
l matrix, we obtain (m,m): (mma) 
s oa foa = (mam). Hence the product 
A,A; 1B, is a square matrix of 
M rows and m, columns which 

can be added to B,. 
We will now invert the matrix 

B, = B} — 4A; 1B; (2-22.5) 


§ 23. Perturbation Methods 143 
and thus solve equation (4): 
y = By1b — By1A,Aj a (2-22.6) 
We need the following matrix products: 
C, = By14,471, C= Ay 1B,By? 
Then x= (A7! + AI 'B,Cəa — Gb, y= —C,a + By b 


The inverse of the system (1) becomes 
> 


This method of inverting a large 
matrix by partitioning it to smaller 
matrices can be conceived as a 
generalization of the Gaussian elimi- 
nation scheme. There we consider 
each equation separately and eliminate 
one unknown at a time. The process 
has thus to be repeated n times. By 
combining a certain number of 
equations into one system and in- 
verting a larger matrix, we can simultaneously eliminate a whole 
group of unknowns and thus reduce the system in larger steps. 
For example, if we have coded the inversion of a 5 by 5 matrix, a 
system of twenty equations with twenty unknowns can be reduced to 
a 15 by 15 system by eliminating five of the unknowns. Then again 
we eliminate five unknowns and reduce the system to 10 by 10. The 
next elimination reduces the system to 5 by 5, which can now 
be solved directly. The process had to be repeated 4 times, while the 
original Gaussian scheme would have required 20 successive trans- 
formations. The partition method has the further advantage that 
accumulation of rounding errors will be greatly retarded if each single 
inversion is performed with great precision by back-checking and 
subsequent corrections (cf. the next section). 


23. Perturbation methods. The word “perturbation” is taken 
from astronomy. When Newton discovered the law of gravity and 
laid the foundation to exact mathematical calculations, the orbits of 
the planets could be predicted with a high degree of mathematical 
accuracy. The predicted orbits, however, did not fit the actual 


144 Matrices and Eigenvalue Problems Chap. Il 


observations with sufficient accuracy. The discrepancy was caused 
by the fact that the planets are subject not only to the gravity of the 
sun but to a minor degree to the gravity between planet and planet. 
This influence is much smaller than the influence of the sun, on 
account of the much smaller masses of the planets. The influence is 
nevertheless not negligible, and has to be added to the primary 
influence of the sun as a correction. The orbits of the planets are thus 
“perturbed” because of the added influence of the planetary masses 
and the “perturbation” can be mathematically calculated. If the 
masses of the planets were of the same order of magnitude as the 
mass of the sun, the mathematical problem of determining the orbits 
of the planets would be a practically unsolvable problem. The 
solution becomes possible by the fact that a good first approximation 
is available by neglecting the mutual influence of the planets and then 
adding it as a small correction. The discovery of the planet Neptune, 
based on the elaborate perturbation calculations of the planet 
Uranus by the eminent French astronomer Leverrier, and in our day 
the discovery of the planet Pluto, based on the perturbation of the 
orbit of Neptune, are impressive examples of the ingenuity of 
perturbation methods. 

In matrix algebra, perturbation methods can be used as a powerful 
tool for counteracting the disturbing influence of rounding errors. 
We design methods for solving linear equations or inverting a 
matrix. We would get exact results, were it not for rounding errors 
which interfere with the exactitude of our calculations and cause 
small errors whose cumulative effect is not negligible. We can always 
check our results by substituting in the defining equation and examin- 
ing how closely the right sides of the equations are satisfied. Since 
we are not interested in absolute accuracy, we may consider a 
certain result as sufficiently accurate. But it is equally possible that 
the equations did not check with the desired accuracy. Shall we now 
throw away our entire computation and start over again, using 
double or triple precision ? 

This is in fact seldom necessary. Even an inaccurate result can be 
very valuable because the exact result may be calculable on the basis 
of a perturbation procedure, without starting our computations over 
again. All perturbation methods are characterized by a certain © 
“perturbation parameter” e which has to be sufficiently small. We 
expand the exact solution into powers of s. We thus obtain an 


§ 23 Perturbation Methods 145 


infinite expansion, but in actual fact only a few terms of this expan- 
sion will be needed. If ¢ is of the order of magnitude of 10-*, the 
second power of e is of the order of magnitude 10~¢, the third power 
is of the order of magnitude 108. Hence it will hardly be necessary 
to go beyond the third-order term of our expansion. Quite generally 
the convergence of our expansion will always be sufficiently rapid if 
we start with a first approximation which is not too far from the true 
solution. The corrections are then available on the basis of a succes- 
sive iteration scheme, starting with a first crude result which in 
itself is not of sufficient accuracy but yet sufficiently close to serve as 
the basis of a perturbation procedure. 

We will apply this perturbation method to three important cases. 
First we consider the inversion of a matrix. Let B be the exact 
inverse of a given matrix A, i.e., 


BA=I (2-23.1) 


In actual fact we have obtained an approximate inverse B which, if 
multiplied by A, gives the unit matrix I only approximately. The 
elements in the diagonal are not exactly 1 and the elements outside 
the diagonal not exactly zero. Generally we can put 


BA=I+ eC (2-23.2) 


The magnitude of e can be normalized by the requirement that the 

absolutely largest element of the matrix C shall become 1. The 

smaller ¢ is, the quicker will be the convergence of the procedure. 
We will now expand B in an infinite series, starting with B. 


B= B + €B, + 2B, + £B, + + (2-23.3) 
Since B is the exact inverse of A, we must have 
(B + eB, + £B, + &B,+°)A=I (2-23.4) 


and carrying over the first term to the right side and dividing by e, 
we obtain 
(B, + eB, + £B, + ) A = —C 


Now we postmultiply on both sides by the expansion (3). On the 
left side the second factor disappears since AB = I, and we get 


B, + B, + €B; + ++ = C(B + eB, + £B, ++) (2-23.5) 


146 Matrices and Eigenvalue Problems Chap. II 


Equating equal powers of £ on the two sides of the equation, we 
obtain the following sequence of equations which permit us to 
determine the B,, Bz, Bz, ©- by successive recurrences. 


Bı = CB 
B, = CB, (2-23.6) 
B} = CB, 


A few steps will give sufficient accuracy. In fact, a large number of 
these successive matrix multiplications would miss their aim by 
bringing in new rounding errors which eventually become larger than 
the remaining error of the process. We can omit the parameter € 
from our resulting formulas by putting 


BA=I4C | (2-23.7) 
B = B 4+ B, + B, + B; + 
with 
B, = CB 
B, = CB, (2-23.8) 


B, = CB, 


Wherever we stop we can take the resulting B, and multiplying it by 
A, see how near the resulting product comes to the unit matrix J. If 
the premises for the successful application of a perturbation method 
were satisfied at all, the remaining errors will now be negligibly small. 
As a second application of the perturbation 
method we consider the case of a matrix which 
is nearly triangular. We have elements below 
the main diagonal which form the triangular 
matrix P. Above the diagonal we do not have 
zero, but elements which are small in comparison 
with the diagonal terms of P. We include all 

these elements in the matrix eC and put 


A=P+ eC (2-23.9) 


§ 13 Perturbation Methods 147 


Now we have no difficulty with the inversion of the triangular 
matrix P, and thus we obtain the new triangular matrix Q. 


OP =I (2-23.10) 
We can assume that this equation is satisfied with an error which is 
small in comparison with €C. 


Let us now expand the true inverse of A in an infinite series, 
starting with Q. 


B = Ọ + €B, + eB, + B, + (2-23.11) 
Hence by the definition of the inverse, 
(Q + £B, + eB, + 8B, + (P + £C) = I 
QP + £€B,P + &B,P + £B,P + (2-23.12) 
+ QC + &B,C+ BC += I 


Since, however, QP = I, we obtain, by equating every coefficient 
of this expansion to 0, 


B,P = —QC 
BP = —B,C 
BP = B,C 
and postmultiplying by Q, 
B, = —QCQ 
B, = —B,CQ (2-23.13) 


B; = —B,CQ 


Once more we can formulate our results without the use of the 
expansion parameter. 


A=P+C 

B = Q + Bi + Ba + Byte 

B, = —QCQ (2-23.14) 
B, = —B,CQ 


148 Matrices and Eigenvalue Problems Chap. II 


Once more we can stop with B,, where k is suitably chosen (it will 
seldom go beyond k = 4), and once more we check the accuracy of 
our results by forming the product of the resulting B with A. If the 
conditions were suitable to successful application of a perturbation 
method, the deviations from the unit matrix will be negligibly small. 

As a third application of a perturbation procedure, we consider the 
solution of the linear set of equations 


Ax = b (2-23.15) 


We assume that we have obtained a solution of this equation by 
some method, without inverting the matrix A. However, by substitut- 
ing our solution in (15) we will not get b exactly, but only approxi- 
mately. Hence we cannot call the obtained solution x, but %. The 
difference 


b— Až= b, (2-23.16) 


is called the “residual vector” of our solution. Our problem is now 
reduced to the solution of the new set 


Ax, = b, (2-23.17) 


Again we do not succeed exactly but obtain an approximate 2, 
which gives rise to a new residual vector bz: 


bı — At, = b, (2-23.18) 

together with the new equation, 
AX, = by (2-23.19) 
x = + %,+ 2, (2-23.20) 


This procedure can be continued until we come to a residual vector 
b, which is so small that the associated x, becomes negligible. Then 
x is equal to the sum of the partial solutions Z + #, + Ža +--+ p1 
This method of successive approximations is operative whenever 
some analog machine is employed to the solution of a linear system 
of equations. Such a machine of a mechanical or electrical type will 
seldom give an accuracy of more than 1%. This means that the 
original right side can be reduced by two significant figures if we 
substitute the obtained solution in the given set. 


§ 24 The Compatibility of Linear Equations 149 


Now a new solution is obtained, and substituting again in the 
residual set, the right side is reduced by another two significant 
figures. Hence in four successive steps, 8 significant figures can be 
gained. The original solution of moderate accuracy has been cor- 
rected to a precise solution of high accuracy. The same principle is 
applicable, however, if we are in possession of some digital method 
(elimination or iterative techniques) by which a moderately accurate 
solution of a given linear set of equations can be obtained. 


24. The compatibility of linear equations. The mere mathematical 
solution of a set of linear equations frequently blinds us to the dangers 
which arise in connection with large linear systems. The temptation 
is to use the large-scale computing facilities of the big electronic 
digital calculators for solution of extensive linear systems, without 
realizing that the exact mathematical solution obtained in this 
manner may have no physical significance whatever. The question 
concerning the physical significance of a mathematically correct 
solution has to be raised and the problem of “noise” has to be 
discussed. The “‘noise’” here in question does not refer to the 
“arithmetical noise” caused by the rounding errors of our cal- 
culations, but to the “‘physical noise” caused by the inexactitude of 
our measurements. 

The following example is well suited to characterize on a simplified 
scale the difficulties inherent in the solution of linear systems of a 
strongly skew-angular type. Consider the two equations: 


a+ y = 2.00001, x -+ 1.0000ly= 2.00002 (2-24.1) 
The solution of this system is 
x = 1.00001, y=1 


From the purely mathematical standpoint we have two equations in 
two unknowns, and the system is not singular since the determinant 
of the system is not zero. Hence the system allows a unique solution, 
and after finding that solution our task is done. 

From the standpoint of a physical system the situation is quite 
different. The right side of a system of equations is usually the result 
of physical measurements, and these measurements are of limited 
accuracy. Hence the right side of the above system may not be 
given with five, but only with two-decimal-place accuracy. But then 


150 Matrices and Eigenvalue Problems Chap. II 


we see at once that our system cannot be solved for x and y. The 
reason is that the unknowns zx and y of our problem enter the equa- 
tions practically with their sum only. The value of x + y is well 
obtainable from the given equations. But in order to separate x and 
y, we need the combination x — y too. Now, if we put 


ae +y=F Heyn 
our equations become 


2£ = 2.00001, 2.00001 — 0.000017, = 2.00002 
(2-24.2) 


Hence the quantity 7 is only exceedingly weakly represented in our 
system, and thus requires excessive accuracy for its evaluation. We 
obtain 7 by a division by 10-5; i.e., multiplication by 10°. This 
requires that the right side shall be known with excessive accuracy, 
which is usually not possible for physical reasons. If, for example, 
the physical noise of the observations would cause an error of 
0.001 in the second equation, then € would still be practically 1, 
while n would become 100, thus giving the entirely erroneous 
solution x = 101, y = —99. We should have recognized that under 
the given physical circumstances we can obtain x + y with sufficient 
accuracy, but separate determination of x and y is out of the question. 

While in this simple example we can follow each detail of the situa- 
tion and demonstrate explicitly the unsatisfactory nature of nearly 
singular systems, we frequently accept the results deduced from 
strongly skew-angular systems without realizing that in view of the 
physical noise of the problem the mathematical solution may have 
little relation to the true values of the quantities that our solution is 
supposed to yield. 

In order to develop the proper critical faculty for realistic appraisal 
of strongly skew-angular systems, we have first to develop the mathe- 
matical theory of the compatibility of linear systems and then properly 
modify it in order to apply it to the question of the physical feasi- 
bility of a given set of linear equations. 

The mathematical compatibility problem of linear systems arises 
from the following consideration. If a problem contains n unknowns, 
our first thought is to get n equations for the determination of these 
unknowns. If the number of equations is less than n, we know in 
advance that the given information will not suffice for unique 


§ 24 The Compatibility of Linear Equations 151 


determination of all the unknowns. If the number of equations is 
‘more than n, we have an abundance of information which will 
generally lead to contradictions. Hence we discard underdetermined 
systems because they contain too little information and cannot lead 
to a unique solution of the problem, and we discard overdetermined 
systems because they contain too much information and serve no 
useful purpose. 

It is important to realize, however, that the compatibility of a 
given set of equations bears no relation to the underdetermined or 
overdetermined or evendetermined (“balanced”) nature of the 
problem. The following two equations are underdetermined, since 
two equations are given for five unknowns. 


+ ta + + Xy+ 4; = 3 (2-24.3) 
22, + 2x, + 2%, + 2x4 + 2x; = 8 


But these two equations are self-contradictory and cannot be solved 
for any values of the five unknowns. On the other hand, the follow- 
ing five equations are given for only two unknowns and thus the 
system is overdetermined. 


x, + ta = 0 

22, + 3%, = —1 

32, + 2%, = 1 (2-24.4) 
tı — t = 2 

3%, + 5x = —2 


These five equations are not contradictory, but nave the solution 
t = l, rg = — i. 

The question arises whether there is a systematic way of deciding 
the compatible or incompatible nature of a given set of equations. 
Such a systematic method exists indeed and can be described as 
follows. Let us consider the linear equation 


Ax = b (2-24.5) 
and let us augment it by the adjoint equation 


Ay=c (2-24.6) 


152 Matrices and Eigenvalue Problems Chap. II 


We form the scalar product of the first equation with y, the second 
equation with x: 


y Ax = yb, a Ay = xc (2-24.7) 
Now by the fundamental transposition rule (8.23) we have 
y- Ax = x Ay (2-24.8) 


This shows that the left sides of equations (7) are equal for any choice 
of x and y. Hence the right sides must also be equal and we obtain 


yb —xce=0 (2-24.9) 


This equation is of no particular use, however, in deciding the 
compatibility of the system Az = b, since it demands the knowledge 
of x (i.e., the solution of the given system), while our aim is to decide 
whether the system is solvable at all. In one particular case, however, 
the vector x will drop out from our relation; viz., if c happens to be 
zero. Then we get 


yb=0 (2-24.10) 


where y is the solution of the equation 


w 


Ay = 0 (2-24.11) 


One can show that this condition is not only necessary but also 
sufficient. We thus obtain the following general principle which 
answers all compatibility problems of linear systems: “The right 
side of a given set of linear equations has to be orthogonal to any 
solution of the adjoint homogeneous equation.” 

This general principle operates in a given case in a variety of ways. 
First, it is possible that the adjoint homogeneous equation Ay = 0 
has no solution outside of the identical vanishing of y. In that case we 
do not get any compatibility condition, which means that the given 
set (5) is compatible with any given right side. Second, it is possible 
that the adjoint homogeneous equation Ay = 0 has one and only one 
solution (not counting the trivial solution y = 0 and not counting the 
freedom of an arbitrary factor by which y can be multiplied, because 
of the homogeneity of the equation). In this case the given right side 
has to satisfy one compatibility condition by being orthogonal to 
the homogeneous adjoint solution. Third, it is possible that the 


§ 24 The Compatibility of Linear Equations 153 


adjoint homogeneous equation Ay = 0 has a number of independent 
solutions. In that case the given right side has to be orthogonal to 
every one of these independent solutions. 

The question of overdetermination or underdetermination or 
evendetermination does not enter specifically the application of this 
general principle except for the fact that in the case of an overdeter- 
mined system the adjoint set has always nontrivial solutions, and 
thus the given right side must always satisfy one or more conditions. 

Examples. 1. We consider the linear system 


2 + ta + r= l 
2, + 2m, + 32, = 3 
r1 11) 
Here the matrix Ais A= 
EEEL 
12? 
The adjoint matrix becomes A=| 1 2 
1 3 


This gives rise to the adjoint equations 


Yı + 2¥,= 0 
Yı + 2Y, = 0 
Yı + 3y3 = 0 


These equations have no nonvanishing solution, and thus the given 
equations are compatible, irrespective of what the right side is. 
2. We now consider the system 


zı H ta + r= 1 

2a, + 2r + 2x = 3 
The adjoint system is Yı + 2Y = 0 
V + 2Y = 0 
Y + 2y = 0 


154 Matrices and Eigenvalue Problems Chap. Il 


This system has the solution 


Hence the right side has to satisfy the condition 
—2¢,+¢,=0 which means Co = 2¢, 


The given right side does not satisfy this condition, and thus the 
equations are incompatible and thus unsolvable. 
3. We consider the overdetermined system 


tı F t = 0 
2x, + 3r, = —1 
32, + 2% = 1 

+ t= 2 
32%, + 5r = —2 


The matrix A has now two columns and five rows. 


The adjoint system is composed of two equations in five unknowns. 


Y + 2Y + 3Y + Ya + 3Y5 = 0 
Yı + 3Y2 + 2¥3 — Ya + 5y; = 0 


§ 24 The Compatibility of Linear Equations 155 
This system has three independent solutions: 
Yı = —5, =l, w=1l y4=0, y¥s=0 
n= —5, y2=2, ¥3=90, y,=l, y=0 
n=0, P=), y=—l, ¥=9, y= —S 
Correspondingly the right sides have to satisfy three conditions: 
—0-5—1-1+1-14+2-0—2.0=0 
—0-5—1-2+4+1-0+2-1—2-0=0 
0-0—1-9+1-(—-1)+4+2:0—2-(—5)=0 


Since these conditions are actually satisfied, the given overdetermined 
system is compatible. 

The decision that a certain system of equations is compatible does 
not necessarily imply that the solution of the system is unique. A 
compatible system of equations may have one or more solutions. 
In our first example, we had an underdetermined incompatible 
system which had no solutions. In our second example we had an 
underdetermined compatible system which has an infinity of solu- 
tions. In our third example we have an overdetermined compatible 
system which has a unique solution. Generally we can say that if a 
compatibility condition is satisfied, we can drop one of the equations 
as superfluous. An overdetermined system will thus turn to an 
evendetermined system. But it is also possible that it may turn into 
an underdetermined system. Consider, for example, the following 
four equations in three unknowns: 


zi + 3%, — 2%, = 11 
22, — Sta + 7x, = —11 
—%, + 2%, — 3%, = 4 
%,+2%,— = 8 
The adjoint equations become 
Yı F 24%2—- Yt Yy = 0 
3% — Syz + 2y + 2y, = 0 
—2y + Ty, — 3Y — Ys = 0 


156 Matrices and Eigenvalue Problems Chap. II 


They have two independent solutions: 


WHW= 1, w=5, w=, y=O0 
yı = 9, Y= 1, Yz = 0, Y = =h 
Correspondingly the given right side has to satisfy two conditions: 


11-1—11-54+4-114+8-0=0 
11-9—11-14+4-0+48-(—11l=0 


These conditions are actually satisfied and the compatibility of the 
given system is thus guaranteed. In accordance with the general 
principle we can drop as many equations as we have independent 
compatibility conditions. This reduces the number of independent 
equations to two, and the system becomes underdetermined, since 
we have only two equations left for three unknowns. 


25. Overdetermination and the principle of least squares. The 
difficulties inherent in many large-scale linear systems are comparable 
to the difficulties of an orator who in his speech tries to cover too 
large a variety of items. In the beginning his speech goes on rather 
fluently. However, as he checks off more and more items on his list, 
he becomes more and more tired and occasionally loses track of his 
thoughts. He does not remember all the incidents he wanted to tell 
at the right moment and thus omits certain items and repeats instead 
with different words the things he has previously said. Since he 
does not stick very closely to the truth, he comes into contradictions 
by forgetting in what direction he slanted the story in his earlier 
remarks (violating the old oratorical principle Mendacem aportet 
esse memorem—The liar should have a good memory). In the last 
ten minutes his mind goes completely blank, he garbles everything, 
and finally sits down to thunderous applause. 

The liar of bad memory of our story is the “noise” which inter- 
feres with the accuracy of our measurements and distorts the true 
course of events. Since noise is of a random nature, the distortion 
is not consistent but occurs once in one, once in the other direction. 
This is one danger encountered in large-scale recordings of physical 
events. The other danger is that the information we have at our 
disposal is insufficient for actual determination of all the unknowns 


§25  Overdetermination and Principle of Least Squares 157 


of the problem. In our story the speaker omitted to comment on 
certain items of his journey and replaced these comments by retelling 
with different words certain episodes on which he commented before. 
In analogy to this situation it can happen (and it frequently does 
happen) that the statements of our system of equations are insufficient 
for complete determination of all the unknowns of our problem. We 
count the number of equations and find that we have just as many 
equations as unknowns. Hence we think that our system is balanced 
and allows a unique solution. Yet it can happen that certain 
equations merely repeat in different words the statements 
made before, without adding anything essentially new to the 
previous statements. In this case our system is underdetermined 
and not in the position to yield a complete solution of our 
problem. 

The two difficulties are interconnected. If the left sides of the 
equations are interdependent, the right sides have to satisfy certain 
compatibility conditions. But these conditions may not be satisfied 
in view of the noise of the measurements, which renders our equa- 
tions incompatible in the strict mathematical sense. Coupled with 
this incompatibility is the fact that our equations are not sufficient 
for deteimination of all the unknowns of our problem, since in the 
case of compatibility we could drop a certain number of equations 
as superfluous, in which case we have not enough equations for the 
complete solution. 

The problems of underdetermination and overdetermination are 
thus interlocked. Our system is in fact underdetermined because of 
absence of certain linear combinations of the unknowns, which 
makes it impossible to obtain all the unknowns of the system. A 
corresponding number of equations becomes superfluous and could 
be dropped. Hence an n by n set of equations which omits m linear 
combinations of the unknowns represents in reality a set of n — m 
equations in n — m unknowns, and is thus overdetermined by m 
equations. 

One of the two difficulties is solvable; viz., the problem of over- 
determination. The ingenious method of least squares makes it 
possible to adjust an arbitrarily overdetermined and incompatible 
set of equations. In fact, we make an asset out of a liability and try 
to overdetermine a set of equations as much as possible by making an 
arbitrary number of surplus observations beyond the minimum 


158 Matrices and Eigenvalue Problems Chap. II 


number demanded by the number of unknowns. We now have a 
linear set of equations, characterized by 


Az = b (2-25.1) 


where A is not a square matrix, but a matrix which has many more 
rows than columns. In view of the errors of our measurements, 
the equations become mathematically incompatible. This means that 
we cannot make all the components of the residual vector Ax — b 
equal to zero. 

But now we can ask for the “best” solution which is still available 
under the given circumstances. For this purpose we form the residual 
vector 

Az—b=r (2-25.2) 


and now we take the square of the length of the residual vector and 
determine x by the condition that r? shall become a minimum. The 
problem of minimizing (Az — b} has always a definite solution, no 
matter how compatible or incompatible the given system is. If 
the given system is compatible, the least square solution x, if 
substituted in Az — b, will automatically give 0. If the given 
system is incompatible, the residual vector Ax — b will not 
give zero but the solution of smallest error in the sense that the 
sum of squares of the residuals will be smaller than that for any 
other choice of z. 

The least-square solution thus completely dispenses with the 
investigation of the compatibility of a given system since we are 
reconciled to the fact that we do not get the exact solution of our 
problem, but the best solution possible under the given circumstances. 
If the given system is compatible, the residual vector of the least- 
square solution becomes automatically zero, thus proving that the 
system is compatible. 

The least-square solution of the equation Ax = b becomes 


AAx = Ab (2-25.3) 


The remarkable fact about this equation is that it always gives an 
evendetermined system of just as many equations as we have un- 
knowns, no matter how strongly overdetermined the original system 
has been. We see at once from the figure that the product of an 


§25  Overdetermination and Principle of Least Squares 159 


m row, n column matrix multiplied by an n row, m column matrix 
gives an m by m square matrix, and thus in the final set the 
number of equations balances the number of unknowns, no 
matter how many equations the original system contained. More- 
over, the matrix AA is always a sym- 

metric matrix, 


(AA) = AA (2-25.4) 


and its eigenvalues are not only real but 
positive or in the limit zero. 


Example. In our 5 by 2 overdeter- = 
mined system (24.4), we obtain | A [Aa 


1 1 1 1 
2 3 2 3 24 27) 
3 210] 3 2 | = 27 40 
1 —1 1 —1 — 
3 5| |3 5 
Moreover Ab becomes 
0 —1 1 2 —2 
L2 @ Tf OY. =3 
1 3 2 -—1 5 —13 


Hence 24x, + 27x, = —3, 27x, + 40x, = —13 (2-25.5) 
which has the solution 


Substituting in the original set (24.4), we find that all equations are 
satisfied. The compatibility of the given set is thus demonstrated. 

On the other hand, even the least-square formulation of an under- 
determined system cannot help us in getting a unique solution. If 
certain combinations of the unknowns are missing in our equations, 
there is no magic by which they could be conjured up. An under- 
determined system remains thus underdetermined, even in the least- 
square formulation. However, an incompatible underdetermined 
system is transformed into a compatible underdetermined system. 


160 Matrices and Eigenvalue Problems Chap. II 


The following four equations in three unknowns represent an ap- 
parently overdetermined but in fact underdetermined and 
incompatible set of equations. 


2%, + 32, — 2x, = 11 
24, — Sa, + 7x, = —10 
ae H a 4 (2-25.6) 
zı F 2ra — qz} = $ 
The least-square formulation of these equations becomes 
Txi — Txt + 14r; = —5 
—Ta, + 42x, — 49x, = 107 (2-25.7) 
14a, — 49x, + 63x; = —112 


The transposed matrix is now identical with the original one, and we 
have to look for the solutions of the homogeneous set: 


Ty, — Ty, + 14y, = 0 
—Ty, + 42y; — 49y; = 0 (2-25.8) 


14y, — 49y, + 63y; = 0 
The solution is 


”A=1 %*%=-lL y=-!1 (2-25.9) 


The right side of (7) satisfies the condition of being orthogonal to the 
homogeneous solution. We can now drop the third equation of the 
set (7) and reduce it to the set of two equations 


Ta, — Tx, + 14%, = — S5 
—Tx, + 42x, — 49x, = 107 


which has an infinity of solutions. The choice x = 0 leads to the 
solution 
t= 2.2, % = 2.9143, t= 0 


which gives, if substituted back in (6), the right sides 
10.943, — 10.171, 3.629, 8.029 


§26 Natural and Artificial Skewness of a Linear Set 161 


instead of the required 
11, -—10, 4 8 


which are not attainable because of the incompatibility of the given 
equations. 


26. Natural and artificial skewness of a linear set of equations. The 
difficulty of inverting the matrix of a strongly skew-angular set of 
equations is sometimes unnecessarily increased by an improper 
scaling of the variables. We conceived the equation 


Ax = b (2-26.1) 


as the problem of analyzing the given vector b in a given skew- 
angular reference system, determined by the columns of A: 


HU, + Lolly + -- + 2,U, = D (2-26.2) 


The determinant | A | of the linear set (1) has the i 
following striking geometrical significance. It repre- f 3 
sents the volume included by the skew-angular U2 
base vectors u,, Ua, ` , Up. This determinant can be- Uy 
come very small for two reasons. One reason is that 

the vectors u; are at a very small angle to each other. The other is 
that the length of some of these vectors becomes very small. 
The latter fact is caused by an inadequate scaling of the 


unknowns 2), %, °°, 2,, and can be avoided. By a mere rescaling 
of the variables according to the equations 


y= %8,, oe Agha, “", a= 4,2 q (2-26.3) 


we can multiply the vectors u,, Uz, *' , U, by arbitrary constants 
Xi Xo “+, a, and can thus avoid any strong discrepancies in their 
length. In fact, we can normalize the length of every column exactly 
to 1. In that case the determinant of the new set is truly a fair measure 
of the skewness of the system. This determinant is always smaller 
in absolute value than 1. The maximum value 1 belongs to the case 
of orthogonal axes. The smaller the determinant, the smaller is the 
mutual independence of the system. On the lower end of the scale we 
find the determinant zero which occurs if one of the base vectors 
becomes a linear combination of the others. 


162 Matrices and Eigenvalue Problems Chap. II 


A further advantage of this normalization is that all the elements of 
the matrix become numbers between +-1. The operation with exces- 
sively large or excessively small numbers is avoided. 

There is, however, a second process of scaling to which we have to 
pay attention. We have seen that in the least-square formulation of a 
linear set of equations the sum of the squares of the residuals was 
formed and this quantity was minimized. Now it is possible that 
the equations themselves are not properly balanced in the process of 
forming the sum of the squares. An arbitrary equation 


i 
æ 


aati + Aita eis "+ Aint n — b; 


could first be multiplied by an arbitrary factor 6, before it is squared 
and added to the sum of the squares of the other equations. We can 
use these factors pĝ; to avoid the danger that certain equations are 
stated with a too small or a too large weight factor. If the weight 
factor of an individual equation is too small, that equation is 
practically nonexistent in comparison with the other equations, which 
means that we lose one of the important pieces of information and 
reduce our system to a practically underdetermined system. If the 
weight factor is too large, we overemphasize that particular piece of 
information at the cost of all the others, which leads to an even 
worse case of underdetermination, excluding all the other valuable 
pieces of information. The best balance is obtained if we choose our 
P: factors in such a way that the sum of the squares of the elements of 
each row, excluding the right side, shall become approximately 1 
With this normalization we have done equal justice to every equation 
and have avoided the two extremes of either overemphasizing or 
underemphasizing one particular equation. 

The two normalizations—the one pertaining to the rows, the 
other to the columns—cannot be performed simultaneously (except in 
the case of symmetric matrices). The balancing of the length of the 
columns will upset the balancing of the length of the rows. Several 
alternating readjustments of rows and columns will be demanded 
before we can be satisfied that the sums of the squares of the coeffi- 
cients in both rows and columns are sufficiently equalized. The 
equalization need not be carried out with great accuracy as long as 
the deviation from an average length is not serious in either rows or 
columns. The double equalization guarantees that the natural 


§27 Orthogonalization of an Arbitrary Linear System 163 


skewness of the system is not made worse artificially by inadequate 
scaling of either equations or unknowns. 


27. Orthogonalization of an arbitrary linear system. We have 
seen that the difficulty with strongly skew-angular systems is that 
certain linear combinations of the unknowns enter the equations with 
excessively small weight. Hence it is not easy to disentangle each 
unknown for itself. This would require an accuracy on the right side 
of the equations, which is frequently not at our disposal. For a 
general analysis of our system the original skew-angular system of 
base vectors, provided by the successive columns of the matrix A, is 
not particularly suited, since we cannot operate effectively with 
skew-angular axes. The fortunate circumstance holds that by mere 
rotation of the reference system, our strongly skew-angular base 
vectors open up more and more. We can introduce a new reference 
system by mere rotation, without any other artifices, in which the 
originally arbitrarily skew-angular axes become completely ortho- 
gonal. This construction is entirely general and is applicable even 
to singular systems. Any arbitrary matrix can be orthogonalized by a 
proper rotation of the frame of reference. 

This result seems at first sight paradoxical since we would think 
that a mere rotation of the reference system cannot alter the mutual 
orientation of the base vectors. However, in this transformation the 
matrix A participates as a matrix and not as an assembly of vectors. 
We have looked upon the matrix A as if its columns represented a 
succession of vectors. While this viewpoint is justified for a geo- 
metrical interpretation of a set of linear equations, we have to realize 
that the transformation of a matrix does not follow the transforma- 
tion law of a vector. A vector is transformed according to the law. 


b = Ub (2-27.1) 
and a matrix according to the law 
A = UAU (2-27.2) 
where U is an orthogonal matrix 
ÜU = I (2-27.3) 


The columns of a matrix A are thus not invariant in either length or 
orientation if an orthogonal transformation is performed. If we 


164 Matrices and Eigenvalue Problems Chap. H 


conceive the columns of A as base vectors, these vectors change their 
length and orientation under the impact of an orthogonal transforma- 
tion. There exists one particular coordinate transformation which 
reorients these axes in a particularly desirable manner by making 
. them mutually orthogonal and thus changing an arbitrarily skew- 
angular system to an orthogonal system. 

We consider the equation 


Ax = b (2-27.4) 
in its least-square formulation: 
AAx = Ab (2-27.5) 


Now the matrix 4A is symmetric. Its eigenvalues are all real and 
positive (or zero) and its principal axes are orthogonal. The eigen- 
values of AA are in no direct relation to the eigenvalues of A itself. 
The latter eigenvalues 4; may be arbitrary complex quantities, 
while the eigenvalues of 4A are real. We will call the latter eigen- 
values “,, in distinction to the eigenvalues of A itself, but it will be 
still more suitable to call them uz. This brings into evidence that the 
eigenvalues of AA are not only real but even positive. Moreover, if 
A happens to be symmetric in itself, then 4A = A?, and the eigen- 
values of 4A become 42, in which case u; = å; 

Now the principal axes of AA form a real orthogonal reference 
system in the n-dimensional space which can be introduced as a new 
reference system. In the new system, 


— Ab (2-27.6) 


XL 
Moreover, in the new system 4A becomes a diagonal matrix. 


AA = ue (2-27.7) 


Since, however, the matrix product AA means the column-by-column 
multiplication of the matrix by itself, we obtain for the base vectors 
G, of the new linear system: 


āā= 0, (i# (2-27.8) 


Qi 
On 
I 

T 


§27 Orthogonalization of an Arbitrary Linear System 165 


The orthogonality of the new base vectors is thus demonstrated, 
although the length of these vectors is by no means equal to 1. The 
eigenvalues u? of AA can be interpreted as the squares of the lengths 
of the new base vectors d;. 

The orthogonal transformation here required never fails to exist. 
Some of the u? may become equal, in which case the orthogonal 
matrix U is not uniquely determined, since in certain directions 
circular conditions hold, and any two mutually perpendicular axes 
of the length 1 may be chosen as principal axes. But this means only 
that in such cases the orthogonal transformation which can accomp- 
lish our final goal of transforming our original skew-angular axes to 
a rectangular system is not unique. In the case of singular systems one 
or more of the u, values become zero but otherwise the general laws 
remain valid. However, since the sum of squares of the elements of a 
column can vanish only if every element vanishes, we see that in the 
case of a singular matrix an entire column of A becomes zero. 

An interesting example of such a transformation is provided by an 
extreme case which is instructive because it demonstrates the behavior 
of strongly skew-angular systems. We consider the matrix 


ay a eee dı 

do a d 

A — ; : 2 
an a, eee an 


Here all the axes of the original skew-angular system collapse into 
one, thus realizing the most extreme case of linear dependence. The 
matrix AA has now only a single element ÈE a?, repeated n? times. We 
can normalize it to 1. 


ay + ag + + a, = 1 


The solution of the eigenvalue problem of A A becomes 


1 
i i Vn 
oi; A=0; + %++2,=0 


The second solution splits into n — 1 orthogonal axes, since the 
eigenvalue A—0O has the multiplicity n — 1. The orthogonal 


166 Matrices and Eigenvalue Problems Chap. II 


matrix U whose columns give the eigenvectors of AA, can be written 
as follows: 


l 1 1 
Vn /2 V2.3 vV (n — 1)n 
1 1 1 l 
vn v2 V/2+3 V(n— 1)n 
2 
U= © VES 
— 0 
Vn 
2 6 ae 
Vn = Vin— Dn 


and, forming the product ŬAU, we obtain the transformed matrix 4 
in the new reference system: 


a, + as + leva a, 0 sae 0 
Vin 0 0 
ilya a 0 0 
V2.3 
atata Da, yo o 


V(n — 1)n 


The orthogonality conditions are trivial in this case, since the last 
n — 1 vectors vanish identically. But our example demonstrates 
what happens if the original matrix A is less extreme by being com- 
posed of columns which do not collapse into one, although they differ 
from each other by only small amounts. In this case the zero field of 
the above matrix A fills up with small elements, but the columns 
remain orthogonal to each other. The sum of the squares of the 
elements give in succession uî, u2, °°, uZ. While u, can again be 
normalized to 1, by applying a universal scale factor to A, the 
consecutive Mo, Ms, °** , Un Will be small numbers, if the columns of 
the matrix A are without exception at very small angles to each other. 
But it is equally possible that we have only a few “bad” axes which 


§28 Effect of Noise on Solution of Large Linear Systems 167 


are at a very small angle to the subspace included by the previous 
axes. The product 


A = php Hn (2-27.9) 


is equal to the absolute value of the determinant of A. If nis large and 
the determinant of A is very small, this can be brought about in two 
ways. We might have a relatively large number of u; of moderate 
smallness, or we might have a relatively large number of u; of the 
order of magnitude 1 and a few excessively small u, At all events 
the new reference system is exceedingly well suited to the numerical 
appraisal as well as theoretical study of large linear systems. 


28. The effect of noise on the solution of large linear systems. 
Inversion of large matrices is a numerically cumbersome procedure. 
If with the proper circumspection we have succeeded in obtaining a 
mathematically satisfactory solution, it is still a question to what 
extent that solution has a bearing on the given physical problem. 
Very accurate calculations require very accurate data. But the data 
of physical systems are frequently far from that accuracy which is 
required by the mathematical calculations. In particular the right 
side of linear systems is frequently the result of physical measurements 
and cannot be guaranteed to more than two or three significant 
figures. It is thus imperative that we should investigate what effect 
small but random changes of the-elements of the right side have on 
the solution. This investigation is not simple in the original skew- 
angular reference system established by the successive columns of A, 
but we get ideally suited conditions 
if we introduce that rotated reference 
system in which AA is transformed 
to a diagonal matrix. 

The transformation is characterized 
by the equations Uz 


x= Už, Z= Üx (2-28.1) 


Now in the new reference system 
the matrix A became orthogonal with columns a@,, whose lengths 
are fy, flo, “°° Un. The solution of the system AAz = Ab can now 
be given as follows. On the left side the unknowns are separated. 


Miz, Ho, Tta Hafn (2-28.2) 


168 Matrices and Eigenvalue Problems Chap. I 


On the right side we have to perform the operation 


a;b = | a;| |b|cos 0; = u; |b | cos 0; (2-28.3) 
if 0, denotes the angle between the vectors b and a;. Hence 
Wik; = m; |b | cos 0; (2-28.4) 
which gives 
b 0; 
ī; = AEA a = (2-28.5) 


The difficulty of solving nearly singular systems is caused by the 
division by yu, if u; is very small. The result of this division is that the 
solution becomes very sensitive to small errors of the vector b. The 
true position of the vector b is not known. It is masked by an added 
error vector which is small compared with the length of b. This 
error vector, however, is of a random nature, and will have com- 
ponents in the direction of the large and small vectors a;. If now the 
smallest vector a, has the length 10-? compared with the largest 
vector a,—i.e., if the ratio of the smallest to the largest eigenvalue of 
AA is 10-®—then in z, the noise will be magnified by the factor 1000. 
This can easily cause an error in the solution which renders it 
physically meaningless. The solution 2,, x2, --- , x, of a linear system 
is frequently not biased in favor of the very small eigenvectors of 
AA, but favors all the eigenvectors with the same order of magnitude. 
This means that all the z, are of the same order of magnitude. But 
then the vector b is strongly slanted against the small vectors a,, 
because of multiplication by the small factors u,. If the noise were of 
the same character, the damage would be slight. The ratio (5) 
would come out as a number of average order of magnitude because 
the numerator becomes small together with the denominator. But 
in actual fact this is not so because, while b itself is strongly slanted 
in disfavor of the small eigenvalues, the noise has no bias against 
these directions and thus causes spurious components x, which 
overpower the true components by a large factor. The result is a 
solution which may be wrong by 100% or more. 

This analysis shows that the critical quantity which decides the 
physical reliability of a strictly mathematical solution is not the 
determinant of the system, but the ratio of the largest to the smallest 
eigenvalue of the symmetrized matrix AA}. It is the square root of this 


1 If A has complex coefficients, the symmetric matrix AA is replaced by the 
Hermitian matrix A*A. 


$28 Effect of Noise on Solution of Large Linear Systems 169 


ratio which measures the magnification of the noise in the direction 
of the smallest: eigenvalue. As long as this ratio does not increase 
above a certain danger point, the problem of noise is not critical. 
But if that ratio becomes 10* and more, magnification of the noise 
in the direction of the smallest eigenvalue becomes 100 and more. 
The accuracy of our physical measurements is seldom sufficient to 
tolerate such an increase of the noise in certain directions. Any 
linear system whose critical ratio surpasses 10* can hardly be con- 
sidered adequate for full determination of the unknowns of the 
problem. 

The reference system of the principal axes of A has still another 
advantage. It brings out in purified form those particular combina- 
tions of the variables which are too weakly represented in the given 
system. The variables z, are completely separated in this frame of 
reference. Moreover, certain Z, enter the equations with a too small 
factor. These ž, can be singled out at once as those quantities which 
practically drop out of the system. But now we can go back to our 
original variables x, remembering that the variables with which we 
have operated, had the following relation to the original 2,: 


x= Ux 


Since the matrix U is given, we can find those linear combinations of 
the unknowns in which the given linear system is deficient. 
In our first oversimplified problem we considered a 2 by 2 matrix 


of the following form: 
1 1 
A F 1+ . | 


where € was a small quantity. We have no difficulty in solving the 
associated eigenvalue problem for AA. The resulting solution 
becomes, considering € a small quantity, 


u=2 e2 
1 1 
J/2 v2 


170 Matrices and Eigenvalue Problems Chap. II 


We form z= Uz: 
1 1 
t= V2 (x, + zə), Lo = V? (—x; + 22) 


It is # which is connected with the small eigenvalue. Hence it is the 
particular linear combination 


2 + Xe 

which is too weakly represented in our system. In the given simple 
example we were able to establish this fact by mere inspection. But 
in more involved systems we have no easy way of telling which 
particular linear relation (or relations) of the variables is practically 
absent in the system. The construction of the matrix U gives a 
systematic answer to the problem, and the knowledge of the eigen- 
values u? presents a quantitative measure for the weakness with 
which these combinations enter the given system. 

If we do not go into the detailed analysis of the noise problem by 
finding the eigenvalues and eigenvectors of the matrix AA, it is still 
imperative that we should convince ourselves that the physical 
noise will not drown out our alleged solution. For this purpose we 
modify the given right side by random quantities of the order of 
magnitude of the errors of the measurements and observe the influence 
of this modification on our solution. If the solution changes by too 
large amounts as the result of this perturbation, we must come to 
the conclusion that our solution, although mathematically correct, 
cannot be considered an adequate solution of the given physical 
problem. 


Bibliographical References 


[1] AITKEN, A. C., Determinants and Matrices (Interscience 
Publishers, New York, 1944). 

[2] FERRAR, W. L., Algebra (Oxford Press, New York, 1941). 

[3] FRAZER, R. A., DUNCAN, W. I., and COLLAR, A. R., Elementary 
Matrices (Cambridge University Press, London, 1938; 
Macmillan, New York, 1947). 

[4] TuRNBULL, H. W., The Theory of Determinants, Matrices and 
Invariants (Blackie & Son, Glasgow, 1929). 


II] 


LARGE-SCALE LINEAR 
SYSTEMS 


1. Historical introduction. The early masters of infinitesimal 
calculus ventured out in a new direction which seemed to differ 
radically from the more conservative concepts of algebra. The 
operations with “‘infinitesimals” and quantities which “‘vanish in 
first, second, -: , mth order,” although of obvious merits, had its 
grave logical difficulties, as pointed out by George Berkeley in his 
“Analyst” (1734). Before the exact limit concept of Cauchy and 
Gauss emerged in the early nineteenth century, Lagrange attempted 
a different solution. He showed how the concepts of algebra need 
not be given up even if working with the problems of higher mathe- 
matics. Derivatives cannot be distinguished from ordinary difference 
coefficients if the Az is made small enough. Integration could be 
replaced by summation. 

This tendency to “‘algebraize’’ calculus and all processes of higher 
mathematics became of increasing importance and eventually 
revolutionized the entire edifice of higher mathematics. Fredholm 
(in 1900) put the entire theory on a rigorous basis when he showed 
how a certain class of integral equations could be conceived as the 
limit of an ordinary simultaneous set of linear algebraic equations 
whose order gradually increases to infinity. Moreover, the error 
remained small even if the order of the associated algebraic system 
was far from becoming infinite. 

While this development was in the beginning purely theoretical, it 
became in our own day of eminent practical importance in view of 
the construction of the large electronic digital calculators which made 
numerical solution of large sets of algebraic equations possible. The 

171 


172 Large-scale Linear Systems Chap. III 


resolution of a boundary value problem into a large set of algebraic 
equations is thus no longer a purely theoretical but a very practical 
tool for solving partial differential equations. 

The actual inversion of a large matrix, however, is even now a 
formidable task. Instead of getting an exact solution by matrix 
inversion, it is preferable to apply simpler operations which in many 
steps come gradually nearer and nearer to the solution desired. 
These are “iteration techniques,” based on the constant repetition of 
the same algorithm. Procedures of this kind are particularly adapted 
to the coding for the big machines. The simplest and most natural 
iterations operate with the successive powers of a matrix. And thus 
we come to the investigation of polynomials, formed with the help of 
matrices. 


2. Polynomial operations with matrices. If x is an algebraic 
quantity, the repeated operations of multiplication and addition lead 
to a linear superposition of the powers of x, called a “polynomial of 


93 


x. 
P (©) = Prt” + Pye” + + pt + Po (3-2.1) 


Polynomials of x have a tremendous variety of applications, because 
of their flexible nature and the ease with which analytical operations 
such as differentiation and integration can be performed with their 
help. If A is a matrix, we can investigate the possibility of using 
matrix polynomials for solution of certain matrix problems. Matrix 
algebra is a complete counterpart of ordinary algebra, although the 
commutative law of multiplication has to be sacrificed. But in the 
presence of one single matrix A, assuming that the coefficients of the 
polynomial P,(A) are ordinary numbers, the noncommutative 
nature of matrices does not enter the picture, since every matrix is 
commutable with itself. And thus we can investigate the possible 
operational advantages of a matrix polynomial. 


P(A) = p,A” + Pah + Pn-24 A + pA + Po (3-2.2) 


However, from the viewpoint of applications, we immediately 
encounter the following difficulty. The successive powers of x are 
generated by simply multiplying the previous power x"! by x. 


gë — yx 


§2 Polynomial Operations with Matrices 173 


We can similarly generate the successive powers of a matrix. 
A? = AA*1 


But this operation involves a tremendous number of single operations, 
since the multiplication by the matrix A means row-by-column 
composition of two matrices. This amounts to n multiplications for 
every element, and since we have to find n* new elements, we need 
altogether n3 multiplications and n? — n? additions for every new 
power of A. This is a prohibitive number of operations. 

This unfavorable balance is greatly improved, however, if we do not 
generate the matrix polynomial (2) as a matrix but let this polynomial 
operate on a given vector b. Hence we consider the polynomial 
operation 


P(A) = (p,A" + Pra4" +" + pAtp)b (8-2.3) 


This is no longer a matrix but a vector, obtainable by successive 
substitutions in the scheme: 


Abi + lgba + + Andy 
aab, + azzba + + aznbn x 
Abie (3-2.4) 
anb: T an2b2 Eg jii T Annbn 


The number of multiplications is reduced to n?, and the number of 
additions to n? — n. The operation A*b is obtainable by k successive 
substitutions of this kind, according to the associative law of multipli- 
cation. 


by = A*b = A(A*®1b) = Abı (3-2.5) 


Moreover, certain polynomials satisfy some simple “recurrence 
relations,” by means of which the polynomial P,,(A)b may be obtained 
in a simple way in terms of P,_,(A)b and P,_.(A)b. All the 
important polynomials of applied mathematics satisfy a recurrence 
relation of the following type (cf. V, 21): 


Pn) = (a, % — B,)P,(@) — Val n-1(%) (3-2.6) 
If a polynomial of this kind is applied to matrices, we obtain 


Prii(A)by = (anA — Bn)P al A)bo — YnPn-i(A)bo. (3-2.7) 


174 Large-scale Linear Systems Chap. II 


Let us agree that the vector generated by the polynomial P,,(A), 
operating on a given “‘trial vector” bo, shall be denoted by b,. 


b, = P (Ab (3-2.8) 


Then the relation (7) implies the following successive generation of 
the vectors b,- 


bnti P («,,A Bb, = Ynbn-1 (3-2.9) 
The nature of the operation P,,(A)by can best be analyzed if we 
introduce the principal axes associated with the matrix A (cf. Sections 
6 and 7). 
Au = au, Av=Av (3-2.10) 
The first equation has the n solutions (omitting the ‘“‘defective” case) 
U, Uo, eve i u, 
The second equation has the n solutions 
Vi; Vo, °°? 5 Urn 


This double set of vectors is in a biorthogonality relation to each 
other. 


U AN (3-2.11) 


u,v, = 1 
t”? 


Let us analyze the vector bọ in the reference system of the base 
vectors u,: 


bo = Bit + Boll, + + Bnin (3-2.12) 
where B: = byv; 
Then by the definition of the principal axes we obtain 
Aba = Pidu + Bahau + + Badgley 
A¥by = Pikun + Badu +--+ Bnâåntn 
and for an arbitrary polynomial operator P,,(A), 
P,(A)by = BiP „(dth + BoP (Aau + + BpPn(Aglly (3-213) 


We see that by introduction of the principal axes of A as a new frame 
of reference, operation of the matrix polynomial P,,(A) is reducible to 
operation of the purely algebraic polynomial P,(A), replacing å by 
the successive eigenvalues 4,, Aa, °°, 4, Of A. 


§ 3 The p,q Algorithm 175 


3. The p.q algorithm. A particularly useful set of polynomials 
is established by the matrix A itself if we add to it an arbitrary 
vector b which plays the role of the “‘right side” of the linear equation 
Ax = b associated with the given matrix. For our iteration technique, 
b will be identified with an arbitrary “trial vector” bọ, formed out of 
random numbers, on which the matrix A is going to operate. We can 
construct the successive powers A*by = b, by generating in succession, 


bo» b; = Ab» ba = Ab = A*bp, ana b, = Abn-1 = A™by (3-3.1) 


In view of the Hamilton-Cayley identity, this last vector must 
become a linear combination of the previous vectors. Although 
generally the previous vectors are linearly independent, we can at 
every stage of the procedure ask for the “‘best identity” which can be 
established between them. This means that while the linear combina- 
tion 

Pr = br + Yea + °° + Yobo (3-3.2) 


cannot be made zero for any choice of the y,, we can nevertheless 
minimize the length of this vector. If A is symmetric, we ask for the 
minimum of the scalar 


Pi = (br + Yrabeai + + Yobo)? (3-3.3) 
If A is not symmetric, we have to distinguish between the oper- 


ations with A itself and with the adjoint matrix A. We then have 
to complement the vector set (3.1) by the adjoint set, 


by, b= Ab, =, ba = Abra (3-3.4) 
The length to be minimized is now defined by the scalar 
Pie = (br + Yrabea + °° + obo bx + Yebra + + + Yodo) 
(3-3.5) 
The solution of this minimum problem leads to a set of linear 
equations for the y,, characterized by a special kind of matrix, 
called “recurrent.” However, instead of solving this set independently 


for every k, we can develop an easily coded progressive algorithm! 
which generates the polynomials 


Px(A)by = (A* + yA”? + ++ + Yobo (3-3.6) 


1 Cf. the paper [5] of the bibliography, which contains an exhaustive treatment 
of the numerous mathematically interesting properties of these polynomials. 


176 Large-scale Linear Systems Chap. II 


and an interlocked second set of polynomials q,(A)bo gradually, in 
successive steps. Finally we reach the stage k =n, and p,(A) by 
becomes identically zero. The polynomial p,(A) coincides with the 
characteristic equation of the matrix; its roots give all the eigenvalues 
of A. 

The polynomials p,(A) [and also q,(A)] are of the type of the 
classical orthogonal polynomials. They satisfy a recurrence relation 
of the form (cf. V, 21) 


Pr+(A) = (A — «;)p,.(A) — BpPys(A) (3-3.7) 


The numerical constants «, and p, of these relations are not given in 
advance; they are determined by the matrix itself, in conjunction 
with the right side bọ, and they come into evidence gradually, as the 
algorithm unfolds itself. 

The p,q algorithm gives a complete solution of the eigenvalue 
problem. It yields all the eigenvalues and eigenvectors of the matrix 
in successive approximations which constantly improve and in the 
end become accurate (apart from the rounding errors). Of particular 
importance are the beginning stages of the process. This phase of the 
algorithm deserves attention of its own since it has many useful 


applications. 
We construct the three vectors 
bo br bg 
bo by be 
and form their scalar products: 
Co = bobo 
c = bb, = bo 


aes : (3-3.8) 
Co = Dobe = bbi = bbo 


C3 = bbe = bob, 


(In the case of a symmetric matrix, b, = b,, while in the case of a 


w 


Hermitian matrix, b,= 5,). Then the determinant condition 


1 A Re 
Co Cy C| =0 (3-3.9) 
Cy Cg Cg 


will give us two roots which are near to the two largest eigenvalues of 


§ 3 The p,q Algorithm 177 


the matrix A if we use a trial vector by which is strongly biased in 
favor of the large eigenvalues. This can be done by multiplying an 
arbitrary random vector vo, several times by 4,4; i.e., by forming 
in succession 


Vv, = Avg, Vo = Áv, i Um = ÁV m-1 

Pte. p a a 
and then choosing bọ = vm, bo = Um If mis as high as 5 or 6, the 
largest eigenvalues are already predominant. In this fashion the 
largest eigenvalue of a symmetric matrix or the largest pair of 
conjugate complex roots of an arbitrary matrix can be obtained in 
good approximation. The method is a natural extension of the 
Bernoulli method of finding the largest root of an algebraic equation 
(cf. 1-15.10). The same method is applicable to the frequently even 
more important problem of finding the smallest eigenvalues of a 
matrix, if we use a transformation which reverses the order of 
magnitude of the eigenvalues (cf. §§ 9 and 10). 

The eigenvectors associated with the two largest roots are also 

obtainable in good approximation. Since equation (9) expresses the 
approximate validity of the relation, 


(A — AINA — AT)by = 0 
we obtain for the eigenvector u, (associated with 4,), 
uy == Abo << Aabo = b; — Aabo 


or in better approximation, multiplying by A 


u, = ba — Agb, (3-3.10) 
and similarly for the eigenvector u (associated with /,), 
u = ba — A,b, (3-3.11) 
Likewise for the adjoint vectors, 
u = ba — fab Üa = ba — Ab, (3-3.12) 


(The accuracy of uu, is greatly diminished compared with u,,u,, 
if the absolute value of A, strongly overshadows that of å; in the 
complex conjugate case, however, both vectors have the same degree 
of accuracy.) 


178 Large-scale Linear Systems Chap. II 


4. The Chebyshev polynomials. The previous section dealt with 
polynomials which were generated by the matrix A itself. The «,,6, 
of the recurrence relation (3.7) had to be obtained in successive steps, 
in the course of a progressive minimization process. Much simpler 
are processes which employ universal polynomials, whose a» br 
are given constants. Amongst these polynomials the simplest and 
most useful ones are the Chebyshev polynomials, named after the 
Russian mathematician P. F. Chebyshev (1821-1894). They have 
the advantage that for them the «,,8, are constants which are 
independent of k. The coding of these polynomials is thus particularly 
simple. 

We will encounter this remarkable set of polynomials in a great 
many garbs; (cf. IV, 16; V, 20; Chapter VII). At present we are 
interested in their applicability to matrix operations, on account of 
the particularly simple recurrence relation by which they can be 
generated. 

We take our start from the simple trigonometric identity 


cos (n + 1)0 + cos (n — 1)0 = 2 cos 6 cos nO (3-4.1) 
Similarly 


sin (n + 1)0 + sin (n — 1)0 = 2 cos 0 sin 0 (3-4.2) 


These identities make it possible that knowing the values of cos 0 
and sin 0, all.the later values of cos nô and sin n0 can be evaluated 
in successive steps. Indeed, having obtained cos n0, the identity (1) 
immediately gives cos (n + 1)0 and thus the scheme continues. The 
same is true of the sine functions.? Now, if we put cos 6 = x and 
start with cos 0 = 1, cos 0 = x, the recurrence relation (1) gives for 
cos 20 a quadratic expression in x, then cos 30 follows as a cubic 
expression in x, and so on. We thus see that the recurrence relation 
(1) may be rewritten in the following form: 


Tat) = 22T (2) — Ta) (3-4.3) 

Thus defining a set of polynomials. What we obtain is still cos nð in 
1 It is of interest to observe that the first trigonometric tables constructed by 
the Hindus actually calculated the values of sin x from degree to degree from 


this recurrence relation, starting with sin 1° as key value which was obtained 
from other considerations. 


§ 4 The Chebyshev Polynomials 179 


value but this cos n@ is expressed as a polynomial of the order n in 
the variable x = cos 0: 
cos nô = T,,(x) (3-4.4) 


Quite similar is the situation with respect to the recurrence relation 
(2) if we divide on both sides by sin 9. We can now start with 


sin 0 B sin 20 


sinô ° sinô Sey 


and obtain‘sin 36/sin 6 as a quadratic expression in x, then sin 46/sin 0 
as a cubic expression in x, and so on. Generally we now obtain the 
value of sin (n + 1)6/sin 0 but expressed as a polynomial of the order 
n in the variable x = cos 0: 


sin (an + 1)0 _ 
sin 0 = U,(@) oe 


The recurrence relation is exactly the same as before: 


U nsa(2t) = 22U,, (2) — Uy a(2) (3-4.6) 
only the starting point is different because now 
U,(x) = 1, U(x) = 2x (3-4.7) 
against 
To (x) = 1, Ti (x) = & (3-4.8) 
The relation 
x= cos 0 (3-4.9) 


has a simple significance. The interval of x extends from —1 to +1. 
We erect a circle over this interval as diameter. 
Then 6 is the central angle of the point P’ on 
the circumference of the circle which is 
projected down on the z-axis. 

Sometimes, however, x ranges only between -1 
0 and 1. In this case we put 


1 — cos 6 6 
pz — = sin? (34.10) 
and define the “shifted Chebyshev polynomials” 7;,(x) by the recur- 


rence relation 
Tn) = 2(1 — 2x) Tp) — To) (3-4.11) 


180 Large-scale Linear Systems Chap. HI 


starting with 
To(x)=1, Ti(z)=1—22 


Similarly the “shifted Chebyshev polynomials of the second kind” 
are defined by the same recurrence relation 


U nl) = 2(L — 22)U iE) — Ups) (3-4.12) 
but starting with 
Uj(z)= 1, Uj(x) = 21 — 22) (3-4.13) 


5. Spectroscopic eigenvalue analysis. It is a frequent occurrence 
in vibration problems that the characteristic frequencies of an 
elastic structure are determined by the eigenvalues of a given 
differential operator. This operator is then approximated by a 
finite difference operator, and the problem becomes a matrix 
problem of finite order. Of particular physical interest are the small 
eigenvalues of this operator. In the realm of large eigenvalues (high 
frequencies) the error caused by the change from a differential to a 
difference operator becomes too pronounced. Moreover, the eigen- 
vectors associated with the large eigenvalues are usually excited with 
very small amplitudes and are thus of minor practical significance. 

In the p,q algorithm considered in §3 the eigenvalues appear 
gradually, from the top to the bottom. It is relatively easy to obtain 
the large eigenvalues of a matrix; the small eigenvalues, however, 
come only late in appearance, at the end of a careful orthogonaliza- 
tion process. And yet frequently our entire interest centers around 
the small eigenvalues and associated eigenvectors. 

One possible way out of the difficulty is preliminary inversion of 
the matrix, which, however, in the case of large matrices is no easy 
task. Hence it is of interest that the Chebyshev polynomials provide 
a tool which opens a new perspective in the eigenvalue analysis of 
symmetric matrices or generally arbitrary matrices whose eigenvalues 
are real. We can search for the eigenvalues of a matrix in any range 
we like, somewhat as a spectroscope can scan the line-spectrum of a 
vibrating atom in an arbitrary prescribed frequency range. We 
resolve the eigenvalue spectrum into spectral lines which can be 
made practically independent of each other. The usual “‘drowning- 
out” of the small eigenvalues by the large ones is circumvented. The 
eigenvalues å, are projected up on a circle. On this circle the concept 


§ 5 Spectroscopic Eigenvalue Analysis 181 


of “small” and “‘large’’ loses its significance. Any point of the circle 
is equivalent to any other point, since any point can be conceived as 
the beginning or end of the circle. 

The Chebyshev polynomials are restricted to a definite range of the 
variable x. The transformation 


x = cos 0 (3-5.1) 


demands that x shall lie between +1. In order to insure this range, it 
will be necessary that the eigenvalues of A shall be properly norma- 
lized. This is made possible by a theorem of Gersgorin (1931) which 
establishes an upper bound for the absolutely largest eigenvalue 
A= Ay ofa matrix. This theorem is based on the following considera- 
tion. We start with the equations which define an arbitrary eigen- 
value A and its associated eigenvector: 


| Anty H H Aik, = Ax; (3-5.2) 
The particular solution which belongs to Ay, shall be denoted by 
Um = (Èr Fo, °° 5 Fn) 


Among these (generally complex) components we select the 
absolutely largest £,; let it be a certain €,. Then equation (2) which 
belongs to i = « gives 
Fi $ En 
—_ a, ss eee a, mina 
E, ++ 2 E, ag + E 
Now we know that the absolute value of a sum of complex numbers 
can never exceed the sum of the absolute values of these numbers. 

3 

— 3-5.4 

E (3-5.4) 
But the second factors are necessarily numbers between 0 and 1, and 
thus 


(3-5.3) 


Au = Ax 


[u| S| aan} E |F | aal |E | bo | tan 


A 
Ea 


Se 
z 


| Ane | <| aa | + | aa | + | aan | (3-5.5) 


Let us now form the sum of the absolute values of the elements of 
each row. 


| ai) | Es | asa | E | din | = s; (3-5.6) 


These are n positive numbers. One of them is the largest; let us 


182 Large-scale Linear Systems Chap. I 


denote this particular s; by the symbol s. Then for all i, s; < s, and 
hence according to (5), 
| âm |<s.<s (3-5.7) 


We have thus found a definite upper bound for the absolutely largest 
eigenvalue of an arbitrary matrix by forming the absolute sum of the 
elements of any row and then choosing the maximum of these n 
positive numbers. Since the transposed matrix has the same eigen- 
values as the original one, we can perform the same operation with 
the columns instead of the rows. The smaller of the two maxima will 
give a better upper bound for | 24, |. 

The estimation of Gersgorin is always safe but frequently un- 
realistic; i.e., we may greatly overrate the largest eigenvalue by this 
estimate. A more realistic, but not necessarily safe, estimate is 
possible of the basis of the procedure of § 3 (cf. 3-3.9) which obtained 
the largest eigenvalue of an arbitrary matrix in approximation. 

Now let us assume that we have a matrix A whose eigenvalues are 
known to be real, although they may be positive or negative. In this 
case we will operate with the following “‘scaled’’ matrix: 


= 24 (3-5.8) 


The eigenvalues of C will be bounded by +2. On the other hand, 
let it be known that all the eigenvalues of A are positive or zero. In 
this case we will put 


4 
C= 21 — - A (3-5.9) 


Once more the eigenvalues of C will lie between +2. 

We start with a “trial vector” bọ composed of random numbers, 
and generate a sequence of vectors, characterized by the recurrence 
relation (cf. 4.3): 

Duy = Ch, — bp-i (3-5.10) 
and starting with 
ba = bo bı = $Cbo (3-5.11) 
We analyze the trial vector by in the reference system of the eigen- 
vectors of A. 


by = Bit + Bola + °° + Bnin (3-5.12) 


§ 5 Spectroscopic Eigenvalue Analysis 183 


An arbitrary b, of the sequence generated on the basis of the recur- 
rence relations (10), (11) becomes 

b, = By(cos kA)u, + Palcos k6y)ug + + + B,(cos kG,)u, (3-5.13) 
where the angles 6,, 0,, --- , O, are associated with the eigenvalues A, 
of the matrix C. In the case (8) the correlation becomes 


A; = cos 0, (3-5.14) 
while in the case (9) we have 
1 — cos 0; 6. 
) ge (3-5.15) 


Now we can introduce the following function f(t) of the continuous 
variable t: 

S(t) = (bicos 0t ju, + (By cos Oat Jug ++ + (Bn COs Opt Jun (3-5.16) 
Then the vectors b, represent the values of f(t) at integer points 
t= k: 

b, = f(k) (3-5.17) 
Our problem can thus be formulated as follows. “We have a function 
f(t) composed of purely periodic components of variable amplitudes 
and variable frequencies. This function is observed at the equi- 
distant time moments 

t= 0,1,2, N 

Find the unknown frequency 6, of each one of the periodic 
components.” 

A problem of this kind is encountered in tide research, in meteor- 
ology, in astronomy, in economic research, and in other fields where 
the resultant interaction of certain periodic components is given, and 
our aim is to disentangle these components and obtain the amplitude 
and frequency of each of the constituent vibrations separately. The 
problem is often called “‘search for hid den periodicities” (cf. IV, 22). 
The solution is obtained with the help of the “Fourier transform” 
(cf. IV, 17). We transform the original f(t) into a new function 
F(p), with the help of the Fourier transform, which in our case 
assumes the form of a sum instead of an integral. 


1 T 
Fp) = 5 fO) + fl) cos Zp 


+ f(2) cos 2 = p Jas s f(m) cos N= p (3-5.18) 


184 Large-scale Linear Systems Chap. II 


This function reveals the existence of a periodic component cos 6t by 
having a peak at the point 


N 
p=—9 (3-5.19) 


because if f(t) is of the form cos 6,t, the associated function F(p) 
becomes 
N N 
F(p) = K E 0, +p) +K Z 6, —p) (3-5.20) 
where 
mé 
2N 


The function K(&) is essentially a function of £ alone, since for small 
values of € we can write 


K(é) = : sin mé cot (3-5.21) 


= 
Ke = 5 ki 


The height of the peak increases uniformly with N but the shape of 
the function F(p) in the neighborhood of the peak £= 0 does not 
change with N. Since 0 ranges between 0 and ~, the range of p 
extends from 0 to N; F(p) is an even function of p and thus negative 
values of p need not be considered. The factor N on the right side of 
(19) shows that the “resolution power” of the method increases 
linearly with N, the number of iterations. While the shape of the 
mountain which characterizes a certain maximum of the function 
F(p), does not change, the sharpness of the peak is proportional to N 
because with increasing N the peaks move further apart and eventually 
even very close peaks can be separated. 

This method of finding the eigenvalues of a matrix by looking for 
the hidden frequencies of an assembly of periodic functions can be 
called a “spectroscopic method” since we imitate mathematically the 
operation of a spectroscope. A spectroscope detects the frequencies 
of which the light emission of an excited atom is composed. These 
frequencies can be evaluated from the position of the “spectral 
lines.” The spectral lines are not lines in the mathematical sense but 
have a certain finite width. The distribution law of the amplitudes in 
the neighborhood of a peak is analogous to that given by F(p). The 
only difference is that the excessive accuracy of spectroscopic 
measurements is caused by the high persistency of optical vibrations 


(3-5.22) 


§5 Spectroscopic Eigenvalue Analysis 185 


which corresponds to a very large value of N. We can cut down on 
the number of iterations and still maintain high accuracy, because of 
the large number of significant figures we have at our disposal. 
Although the peaks are now much closer together, we may correct 
our preliminary results by evaluating the interference from the 
neighboring peaks and subtracting their action. In this fashion the 
position of the maxima may be ascertained with a relative accuracy 
of 10° and more. 

The successive vectors bp, b1, --- , b,, need not be printed out in full. 
It suffices to print out a single element of each vector, as long as it is 
consistently the same element (e.g., the first, second, ---) of each vector. 
We thus obtain a one-dimensional sequence of numbers: 


Yos Yis Y2 9 YN (3-5.23) 


which can be subjected to a Fourier cosine analysis. Of particular 
interest are the integer values p= k of p. The corresponding 
functional values F(k) are denoted i Yg. 


1 Nrk 
Yr = 5 yo + 71 cos TE + y cos = n “+= 5 yy cos <= (3-5.24) 


If we evaluate these y, systematically for k = 0, 1, 2, = , N, we have 
transformed the original sequence (23) into a new sequence, 


Yoo Yr» Ya. ° YN (3-5.25) 
Let us denote 
N 
w; = — V (3-5.26) 
TT 


Then the relations (20) and (22) yield 
N sin n(w; — p) 
2 mw; — p) 


For integer values of p the numerator becomes (—1)* sin 7w, and we 
see that, apart from the constantly alternating plus-minus signs, the 
interference of one peak on the other is given by the function, 


Klo; — p) = 


SIN mW; 

pe Santi LI 3-5.2 
mw; — w;) nen 
If by accident a certain peak at p = w, happens to fall on an integer 
value w; = k, we get a solitary peak without any “tails.” One single 


186 Large-scale Linear Systems Chap. IH 


maximum is then flanked by zeros on both sides. If, however, the 
maximum does not fall on an integer value, the maximum amplitude 
is flanked by smaller amplitudes on both sides. The slowest decrease 
occurs if the maximum is exactly halfway between two integer values 
of p. The amplitude pattern now becomes 


$,—4, +4, —4, I, l, — 4, +5 — he 
This slow decrease of the amplitudes can be considerably speeded up 


by operating with the second differences of the original amplitudes. 
In view of the alternating + signs, the desired operation becomes 


Ze = Yr- F 2Y + Yrs (3-5.28) 


The previous pattern is now changed as follows. 


g S95: Bis: oe 105; is, 3 3, 18 a 105, ESET re 
The interference diminishes now with the cube of the distance between 
two peaks and will generally become negligible, except if two very 
near peaks have to be separated. 

The exact position of a maximum, based on second differences 
(actually second sums, due to the + pattern), can be obtained as 
follows. We examine the sequence y, and pay attention particularly 
to the regular + sequence of signs. At certain points we notice that 
this sequence is interrupted by a +-+, or —— sequence. We under- 
line these irregular sequences and we know that a peak must occur 
between two such p values, p = k and p = k + 1. We put 


p=k+e (3-5.29) 
We form the ratio of two successive z, (cf. 28) values belonging to 
p=kandp=k+1. 

q, = — (3-5.30) 


while z,,, is proportional to 
] 2 l 2 


sr “ee Ce 


§ 5 Spectroscopic Eigenvalue Analysis 187 


Forming the ratio, we get 


O ë Zk o e+ 2 
i Zk+1 e— lI 
from which 
Pe cod | (3-5.31) 
l+q 
Finally 
0 = = (k + 8) (3-5.32) 
and 
a= sin? i = eost [case (9)] (3-5.33) 
À = cos 0 [case (8)] 


and finally going back to the eigenvalues of the original unscaled 
matrix Á, 
A, = sd, (3-5.34) 


The most surprising feature of this algorithm is that it is entirely 
free of a dangerous accumulation of rounding errors. While in the 
p.q algorithm of § 3 the rounding errors accumulate rapidly and have 
to be counteracted by a constant reorthogonalization process, the 
present algorithm allows a continuation to hundreds and perhaps 
even thousands of iterations, without undue danger or distortion from 
the part of rounding errors. Since the “signal’’ increases proportion- 
ally to the number of iterations, a not more than linear increase of 
the rounding errors would leave the “signal-to-noise ratio” un- 
changed. Experiments on small matrices could not detect any 
damage even after 2000 iterations. Experiments on large matrices 
are not yet available, nor is the general statistical behavior of the 
noise associated with this algorithm sufficiently investigated. It 
seems, however, that the great precision obtainable with this method 
in independent determination of eigenvalues and the high resolution 
power in the separation of close eigenvalues will not suffer by an 
increase of the size of the matrix to which it is applied. 

It is convenient to choose N, the number of iterations, as some 
multiple of 180, since the trigonometric tables divide the half-circle 
into 180 parts. If, for example, we take N = 720, we scan the half- 
circle in units of 15’. Since the second sum method gives good 


188 Large-scale Linear Systems Chap. Il 


results if two neighboring peaks are at least 4 units apart, we can 
obtain with high precision the position of any eigenvalue on the 
circle which is separated from its neighbors by at least 1°. 


6. Generation of the eigenvectors. In the previous section only 
a single component of the vectors b, was employed. The method for 
isolating a definite frequency is a resonance method; we con- 
sistently increase the magnitude of a small periodic component whose 
frequency happens to agree with the impressed frequency. In the 
previous section we were interested in the exact position of the 
maximum only. But the magnitude of the peak is also of interest if 
our aim is to obtain the eigenvector which is associated with a certain 
eigenvalue. The “second sum” method (cf. 5.28) is a valuable tool 
again in isolating the peak from the contaminating influence of the 
neighboring peaks. If we find that a maximum is between p = k and 
p= k + 1, with a larger amplitude at p = k (or between p = k and 
k — 1 with a larger amplitude at p = k), the quantity 


Ze = Yur F 2Y + Yk (3-6.1) 


will be proportional to one of the components of the eigenvector u,. 
In order to obtain the entire eigenvector u,, we have to repeat the 
same calculation for every component. 

This again can be done with maximum economy, without printing 
out the single vectors b,. We have stored these vectors on tape. We 
now multiply each one of these successive vectors by a preassigned 
weight factor p, and form the sum, thus obtaining 


N-1 
ü, E bo T > Paba (3-6.2) 
a=1 
where the weights p, are defined as 
= ( 1 + cos Za) COS Zka (3-6.3) 


Only the final sum &, is printed out. The vector ü, (the bar refers to 
the fact that it is not the exact eigenvector u, but only a close approxi- 
mation of it) is not normalized in length. The contamination from 
the part of the other eigenvectors is negligible if the peak near p = k 
and the next peak are at least 4 units apart. 


§7 Iterative Solution of Large-scale Linear Systems 189 


By this method any particular eigenvector of A or all of them may 
be generated. If A is symmetric we can test the ž, thus obtained for 
orthogonality. If A is not symmetric, we have to operate simultane- 
ously with A and A, repeating the entire process with another trial 
vector by and replacing A by A. Again the 4, are stored on tape, but 
the weighting (2) occurs once more in identical fashion. The resulting 
sum is again printed out and gives our ð. The vectors #, and ð must 
now show the biorthogonality relation (2-6.16). 


7. Iterative solution of large-scale linear systems. Application of 
the Chebyshev polynomials to the eigenvalue problem of matrices 
with real eigenvalues leads to a method of solving large-scale linear 
systems by successive iterations, without actual matrix inversion. 
We encounter the preliminary. difficulty that the Chebyshev poly- 
nomials operate properly only in the real range, while the eigenvalues 
of an arbitrary nonsymmetric matrix A are generally complex 
numbers. There are two ways in which to overcome this difficulty. 
One is that we reformulate the given equation 


Ax = b (3-7.1) 
according to the method of least squares. 
A*Ax = A*b (3-7.2) 


The new matrix A*A is “positive definite,” i.e., its eigenvalues are all 
real and even positive numbers. If the original matrix A was properly 
scaled in rows and columns (ef. II, 26), the diagonal elements of A*A 
will be nearly 1, and the nondiagonal elements will lie between +-1. 

The symmetrization of A by premultiplication by A* isa simple but 
elaborate operation. It is equivalent to n iterations, and if n is high, 
this operation is by no means trivial. Moreover, if A itself has many 
zeros, this is no longer true of A*A, and thus a great practical 
advantage of the original matrix A may be lost. Hence it is sometimes 
preferable not to generate the matrix A*A explicitly, but to obtain 
the operation b, = A*Aby in two steps, viz., 


bi = Ab», 6, = A*b} (3-7.3) 


However, there is an altogether different way for symmetrization of 
A which avoids premultiplication by 4*, at the cost of doubling the 


190 Large-scale Linear Systems Chap. II 
size of the matrix. Let us enlarge the given linear system as follows: 


Ax=b, A*y=0 (3-7.4) 
considering 
z = (y,2) (3-7.5) 


as a vector of 2n components. On the surface, addition of the second 
equation appears an unnecessary luxury, since it has the trivial 
solution y = 0. However, the extended matrix 


0 A 
B= | 48 i (3-7.6) 
has the great advantage that it is always Hermitian. 
Bt = B (3-7.7) 


Hence the eigenvalues of B are always real. The application of the 
Gersgorin method to the new matrix leads to interesting 
consequences. In §5 we found that an upper bound on the eigen- 
values of A could be found by considering the absolute sum of each 
row and selecting the maximum of these n numbers;- or the absolute 
sum of each column and selecting the maximum of these n numbers. 
The smaller of these two values could be chosen as our s. Now the 
larger of these two numbers has to be chosen as the s of the enlarged 
matrix (6), and we will put 


C= = B (3-7.8) 


The eigenvalues of C will again lie between the limits +2. 


But this same s can serve still another purpose. The eigenvalue 
problem of the matrix B leads to the equations 


Ax= py, A*y= px (3-7.9) 

which gives 
A*Ax = px (3-7.10) 
This shows that the 2n eigenvalues of the matrix B are the square 


roots +A of the eigenvalues of A*A. Since we have bounded the 
absolutely largest eigenvalue of B by s, we now find 


Au <8 (3-7.11) 


§7 Iterative Solution of Large-scale Linear Systems 191 


as an estimated upper bound of A*A, without actually generating the 
elements of this matrix. Moreover, the matrix 


1. 
S= 4*4 (3-7.12) 


will have eigenvalues which lie between 0 and 1. 

For our following discussions we will assume that we have sym- 
metrized A by premultiplication, and not by the extension method 
(4), although the resulting algorithm can easily be reinterpreted for 
the case (4). 

We will formulate our problem as an eigenvalue problem whose 
solution we have found in § 5. For this purpose we write the given 
equation (1) in the form 


Sox + c= 0 (3-7.13) 
with 
A*b 


In many problems arising from self-adjoint differential equations we 
have a matrix which from the very beginning possesses only positive 
eigenvalues. In that case no symmetrization is needed and we can 
put directly 


1 1 
S,= - A, c——-b (3-7.15) 
S S 


Now our linear set of equations in n unknowns: 
Saty + 2ta To F Sinn +, = 0 


Snit T Snore af _ 4 Snntn F Cn = 0 


can be reformulated as a homogeneous set of equations in n + 1 
unknowns: 


Sti F Sata + e + Syn F Cn = 0 
Snit + Sna + i + Sunt n + Cr ns ~ 0 


This means that we add the right side c as an additional column to 


192 Large-scale Linear Systems Chap. III 


our matrix Sy. We thus get an extended (n + 1) by (n + 1) matrix S, 
and the vector x takes on one more element. 


(3-7.16) 


The equation (13) is now reformulated as the following homogeneous 
system of (n + 1) equations in (n + 1) unknowns: 


Since an arbitrary scale factor remains free in uj, we may normalize 
this scale factor in such a way that z,,,, shall become 1. Then the 
Lis Los | , 2, become identical with the desired solution x of the 
original set (13). However, we can also leave the scale factor 
arbitrary and agree that at the end we will form the ratios 


vy Xo e.e Tn 


bj > 9 
Cnty T n+l T n+l 


These ratios will give the solution of the original inhomogeneous 
equations. The two systems (13) (17) are thus equivalent, and from 
now on we will operate with the homogeneous system (17). 

The n eigenvalues and eigenvectors of the matrix Sọ hold also for 
the extended system, if the eigenvectors are complemented by the 
element z,,, = 0. However, the extended matrix S has n+ 1 
eigenvalues and eigenvectors. The (n + 1)st eigenvalue of S is 
A = 0, and the associated eigenvector is defined by the equation 


Sug == Auy = 0 (3-7.18) 


The notation u for the solution of the homogeneous set (17) was 
chosen because this solution has the significance of one of the eigen- 
vectors or principal axes of the extended matrix S. The solution of 
a homogeneous system of equations can thus be reformulated as a 
special case of an eigenvalue problem. Let A = 0 be known as one 
of the eigenvalues of a given matrix, and find the eigenvector 
which belongs to the eigenvalue A = 0. 


§7 Iterative Solution of Large-scale Linear Systems 193 


We will display the complete eigenvalue analysis of the matrix S 
and the transposed matrix S. In accordance with our usual procedure 
we put the components of the successive eigenvectors in successive 
columns of a matrix. In our problem the mutually orthogonal 
principal axes of Sọ are taken for granted. They are denoted by 
Uy, Us, *' , Up. The additional (n + 1)st element is indicated 
separately. 


S S 
A=0 | A, | Ap An A=0,A4, | A An 
x | uy | U Uy, 0 Uy Up Un 
1jojoj |o ieejas] ae 
A, | As An 
(3-7.19) 
We start with a trial vector by, chosen as follows. 
by = (0, 0, =, 0, 1) (3-7.20) 


We analyze it in the reference system of the principal axes of S. 


bo = oto + Bit +o + Bnin (3-7.21) 


The coefficients 8; are obtained by forming the scalar product of bo 
with the adjoint axes. Hence 

b= B= (3-7.22) 
We let some properly chosen polynomial P,,(S) operate on bọ. In 
view of (2.13), we obtain 


Pi('S)bo ag Pm (0)uo T Pmp aoe as Pnn) Pnn (3-7.23) 


Our goal is to obtain the vector u alone without any contamina- 
tion from the other vectors u;. Hence our aim will be to search for a 
polynomial P,,(4) which should take the value 1 for å = 0 and should 
remain small for all other values of A between 0 and 1. Although a 
polynomial cannot drop from 1 to 0 suddenly, we may succeed in 
constructing a polynomial which will decrease steeply from the 
initial value y = 1 at A = 0 and then remain uniformly small. The 


194 Large-scale Linear Systems Chap. III 


Chebyshev polynomials of the second kind will give an adequate 
solution of this problem. 

We construct a sequence of vectors b}, bo, + , b, by the following 
simple iterative routine. We form the matrix 


C= 2I — 4S 


and generate the 5,,, vectors by a uniform recurrence scheme which 
at each step involves two vectors, viz., the last vector b, and the 
previous vector b,_. 


bry = Ch, — bk (3-7.24) 
The scheme starts with the two vectors 
bi = bo by = Cb, 
Then the normal routine follows: 
b = Cb, — b; 


and so on. Finally, after arriving at a certain b„, we divide that 
vector by m. The polynomial operator thus generated can be 
written 


A= sin? 4 (3-7.25) 


This function has for small 6 the character of a universal function of 
é, if € is defined by 


E = 4m) (3-7.26) 


The polynomial P,,(A), considered as a function of &, is in good 
approximation equal to 7 
sin VE 


VE 
This function begins with y= 1 at = 0, decreases steeply with 
increasing &, but develops secondary maxima and minima. As m, 
i.e., the number of iterations, increases, a certain eigenvalue A, moves 


out along the curve with the speed m?, according to the relation (26). 
If we increase the number of iterations, even a very small / will 


$(€) = 


(3-7.27) 


§7 Iterative Solution of Large-scale Linear Systems 195 


eventually move out to large values of £ at which ¢(€) is already very 
small. But this process has very slow convergence if the given 
linear system is strongly skew-angular. In order to avoid an exces- 
sive number of iterations, it is preferable to terminate our recurrence 
scheme after m steps and then repeat the entire cycle a second time, 
and possibly a third and fourth time. This can be done with no 
difficulty by a slight modification of our routine. 


k sinyë 


It is advisable to choose m, the length of each cycle, as some power 
of 2, e.g., 128 = 27. Then the division by m is accomplished by a 
mere shift of 7 binary digits. We have thus obtained 


b, bz, Sy Dis bin = bml m 


and now we continue with our routine by choosing b,, as the starting 
vector b,,,, of the second cycle. The interference with the regular 
routine occurs only in one step, viz., in the formation of bm+2: 


baa = Chm (3-7.28) 


196 Large-scale Linear Systems Chap. III 
where the subtraction of b,, is prevented. But immediately afterward 
we return to the normal routine: 
bm+s = Commie — Omir 
and continue undisturbed, until b,,, and 
bom = be,,/m 


is reached. Then the third cycle starts and thus we go on, until y 
cycles are finished. The resultant polynomial operator Q(A) becomes 


QnA) = [Pr (3-7.29) 


Let us see how the length m of each cycle and the number » of the 
repetitions of the cycle are related to the accuracy of the solution 
obtained. For this purpose we substitute the last vector b,„ in the 
given equation (17) and form the residual. We denote 


ym = N 
Then 


r = Sb y = p1 Q nata + boQ yAn t + P nAn NAU n 


= (cy) QO y(t (CUa) O nlati t + (CU )ON(A nun 
(3-7.30) 
Now (c-u,), (Cua), *** , (cu,) are the components of the vector c in the 


reference system of the principal axes of Sy. If we think of our 
solution once more in terms of the inhomogeneous equation 


Sot = —C 
and put 
x= by +y (3-7.31) 


we obtain for the correction y the new equation 
Soy = —r (3-7.32) 


Even without evaluating y we can predict theoretically how accurate 
our solution by will be. The vectors t4, Uz, * , u, are orthogonal to 
each other. If we assume that the eigenvalues A, are arranged in 
increasing order, the smallest eigenvalue will be A,; the largest error 
will occur in the direction of the associated eigenvector u,. The 


§7 Iterative Solution of Large-scale Linear Systems 197 


factor by which we have cut down the component of the vector c 
in this direction is Q,(A,). Now let us assume that the smallest 
eigenvalue A, is beyond a certain critical value /,, defined by 

Ee 


A= Ts (3-7.33) 


where 
é, = (2.554)? = 6.523 (3-7.34) 


This particular value of £ is chosen in view of the first secondary 
maximum of the function (sin x)/x which is at x = 4.4934. The 
value of this maximum is —0.2172. The same amplitude with a 
positive sign was reached earlier at the point x = 2.5536. Beyond 
this point the function ¢(&) never rises above 0.22. Hence we can say 
that if the smallest eigenvalue A, of our matrix Sọ does not drop 
below a certain critical value, 


1.277\2 
A, 2 (= 7) (3-7.35) 
m 


the relative error of our solution will be 
n < (0.22) (3-7.36) 


This “relative error” means the following. We cannot measure the 
accuracy of the solution of a linear system by comparing the calculated 
value of a certain component with its true value. The true value of a 
certain component x, may accidentally be very small and eyen zero, 
while another component x, may be very large, A fair measure of the 
accuracy is the length of the error vector divided by the length of the 
true vector, i.e., the square root of the sum of the squares of all the 
individual errors, divided. by the square root of the sum of the 
squares of all the unknowns 2;: 


_ (Gap? + Cx)? + + Ox, 
ap ate 


Here ôx, denotes the difference between the true value x, and the 
calculated value zj. 

The relation (36) shows that the number of repeated cycles decides 
the accuracy of our solution. Two cycles will give us an accuracy of 


y (3-7.37) 


198 Large-scale Linear Systems Chap. III 


4.72%, three cycles an accuracy of 1.02%, and four cycles an accuracy 
of 0.22%. These are estimated accuracies, and an error estimation 
has to be based on the worst possibility. The actual error may be 
much smaller, since the å, of our matrix may be nearer to the zeros 
than to the maxima of the previous graph. 

The relation (35), on the other hand, shows that the length of each 
cycle decides the separation power of the method. In the case of a 
very small 4, the minimum number of iterations demanded for 
successful separation of the true solution from the contribution of 
the smallest eigenvector may become quite large. We have seen, 
however, that physical systems with a measured right side will 
hardly allow a A, which drops below 10-4 (otherwise the noise of the 
measurements will drown out the mathematically correct solution). 
But then a cycle of m = 128 = 2’ iterations will be sufficient to 
purify the solution from all contaminations caused by the eigen- 
vectors of Sọ. Moreover, the accuracy of the solution needs hardly 
surpass 5%, because of the inaccuracies of the physical observations. 
We can thus say that a double cycle of 128 iterations will cover 
practically all physical systems, irrespective of the size of the matrix. 
For the sake of brevity we will call the scheme (24) of generating a set 
of vectors by the name “‘C iterations.” 


8. The residual test. If we have made a complete eigenvalue 
analysis of the matrix So, we can tell in advance from the length of the 
cycles of the previous algorithm and the number of cycles employed, 
what the minimum accuracy (maximum error) of our solution 
will be. The actual accuracy may be much more optimistic. Hence 
we would like to find out how close our solution comes to the 
mathematically perfect solution. The most natural method of 
finding out the accuracy of a solution would seem to be sub- 
stitution of the alleged solution into the given equation. If we 
find that the given equation is satisfied, we know that our answer 
is correct. 


1 The polynomial (15) has a strong maximum not only at 0 = 0, but also at 
6 = 7. This means that the spotlight is put on the neighborhood of A = 0 and 
A = 1. Hence the eigenvalue spectrum must not extend completely up to A = 1. 
The Gersgorin estimation usually overestimates the largest eigenvalue so strongly 
that the danger that S, has an eigenvalue at A = 1 is very slight. However, in 


order to be on the safe side, we may define the matrix (3) as AA divided by 1.05%. 


§ 8 The Residual Test 199 


This method, however, has its great dangers. It is true that we can 
test in this way a solution of absolute accuracy. If we substitute in 
the expression Sox + c and find that the residual vector 


r= Sete (3-8.1) 


comes out zero, we know that we have found the proper x. This will 
happen only very occasionally and usually only in particularly 
simple but not characteristic examples in which the elements of the 
matrix Sọ are chosen as simple integers. In most cases the limited 
accuracy of numerical calculations will prohibit a perfect answer. 
Hence we cannot expect that the residual vector r, obtained as a 
result of substituting the obtained solution into the given equation, 
will be truly zero, but only that it will be very small. Yet it is one of 
the paradoxes of linear systems that this simple test—we may call it 
the “residual test’’—need not be a true measure of the accuracy of 
our solution. While a vanishing residual indicates a perfect answer, a 
small residual need not indicate a close answer. 

The following example may illustrate the situation. Let us assume 
that the largest eigenvalue of Sọ is 24, = 1, the smallest eigenvalue 
A, = 10+. The true solution of our problem will be 


x= Uy, + un 


i.e., the smallest and the largest eigenvector participate in the solu- 
tion with equal strength. However, we are not aware of the eigen- 
values and eigenvectors of Sọ We have merely found by some 
numerical procedure a solution which happens to be 


x’ a ui + 1.0lu,, 


On the other hand, by some other numerical procedure we have found 
another answer which happens to be 


x” = 4u, + ug 
In the first case the error vector is 
a’ —x= 0.01u,, 
In the second case the error vector is 


x” — g= 3u, 


200 Large-scale Linear Systems Chap. II 


Hence the relative error is in the first case, 


0.01 — 0.0071 
V1i+1 
and in the second case, — = 2.13 
V2 


That is, the first solution is in error by less than 1%, the second by 
more than 200%. But making the residual test we find in the first 
case, 

r' = 0.01u,, 


in the second case, r” = 0.0003u, 


If we go by the residual test, the odds are all against the first solution, 
since the error is 33 times larger. We would thus gladly accept the 
second solution as the superior one. And yet the first solution is a 
very satisfactory solution, the second solution a completely useless 
solution. Hence we see that the mere residual test, without any 
further investigation, is no measure of the accuracy or inaccuracy 
of a given solution, because the error in the direction of a small 
eigenvector is multiplied by a very small factor and thus practically 
obliterated, although that error is present in the solution itself, 
before multiplying by the matrix Sp. 

The residual test can be made reliable, however, if we do not stop 
after obtaining the residual, but use this residual as a new right side 
and repeat the entire procedure a second time. Finally we form the 
residual once more. If the new residual is greatly diminished 
compared with the previous one, this is an indication that we did 
find a solution of the given problem. If, on the other hand, the 
residual remains essentially unchanged at the end of the new cycle, 
this indicates the presence of a very small eigenvalue with which our 
iteration process was unable to cope. 


9. The smallest eigenvalue of a Hermitian matrix. The iterative 
procedure of § 7 gives a new attack on the problem of small eigen- 
values. Ordinarily, if we want the smallest eigenvalues of a matrix, 


§9 The Smallest Eigenvalue of a Hermitian Matrix 201 


we first invert that matrix and then look for the largest eigenvalues. 
For this purpose we generate the vectors (cf. 33.8): 


bo, ADi Ads, a (3-9. 1) 


and the corresponding vectors of the adjoint system. But the 
equation b, = Aby is equivalent to the equation 


Ab, = by (3-9.2) 


and since we can solve this equation by successive iterations, the 
inversion of the matrix A is no longer demanded. Then, after 
finding 5,, we replace b, by b, and repeat the procedure. We thus 
obtain the solution of the equation 


Ab, = bi (3-9.3) 


It is true that we have obtained the iterative solution of linear 
equations only for the case of positive definite matrices. But the 
matrices appearing in vibration problems are usually of the positive 
definite type. Moreover, if A is an arbitrary generally complex 
matrix whose smallest eigenvalue is to be determined, this eigen- 
value problem is closely related to an associated problem, involving 


the Hermitian matrix . 
So = A*A (3-9.4) 


Although A itself has complex elements, the matrix Sọ has always 
real eigenvalues, since the Hermitian condition 


So* = So (3-9.5) 


is satisfied (cf. 2-8.29). Moreover, in view of the least-square 
character of the matrix (4), its eigenvalues are not only real, but even 
positive. The equation 

Sob, = bo (3-9.6) 


is thus always solvable by the C iterations discussed in § 7. 

In the realm of very small eigenvalues we did not succeed with the 
inversion problem. This, however, is no serious handicap. The 
principal aim of the inversion is to reverse the order of magnitude 
of the eigenvalues by making the small eigenvalues large and the 


202 Large-scale Linear Systems Chap. III 


large eigenvalues small. Our process achieved this aim. With two 
cycles of m iterations we have generated the following function of 
the eigenvalue A: 


— (sin V£) 
P OV 
é? 
This function differs from A~! only in the realm of small £, i.e., in the 


realm of very small A. But the function (7) is a legitimate substitute 
for the function A“! even in this realm. We generate the scalars 


(E = 4m?A) (3-9.7) 


Co = bobi, Ci = byb*, Co = bobs = bbž, C3 = b,b3 (3-9.8) 


These scalars are all real, although the vectors b, are complex. Then 
the quadratic equation 


l u p 
Co Cy Co = 0 (3-9.9) 
Cy Cg Cg 


establishes the two largest eigenvalues of the “reversed” matrix. 
The Jarger of the two roots u, > Ma is chosen, in order to obtain the 
smallest eigenvalue of A. The corresponding eigenvector u is given 
by 

u, = by — Mab, (3-9.10) 


This method operates even better if we start with a trial vector 
which has boosted up the smallest eigenvalue. We can do that by 
applying two or three double cycles of C iterations to the random 
vector rọ Let the vector thus obtained be w, and let us apply one 
more double cycle of C iterations to it. This vector w, can be con- 
sidered as a good approximation of the eigenvector u, and then the 
ratio 

WoW} 


= ; (3-9.11) 
WoWo 
gives the transformed value of the eigenvalue A. 
The transformation back from u to A can be accomplished as 
follows. We first evaluate 


2m 


sees 3.9.12 
wan? (3-9.12) 


§10 The Smallest Eigenvalue of an Arbitrary Matrix 203 
Then we obtain u by solving the equation 
u 
e H y 
V1 — (sin zulu)? 


This can be done by tabulation (cf. Table III), for v < 5. Beyond 
v = 5 we can put practically 


TERA | 1 (= =) (3-9.14) 
TU i 


After finding u, we obtain 4 on the basis of the relation 


(3-9.13) 


= (= u) (3-9.15) 


10. The smallest eigenvalue of an arbitrary matrix. The problem 
of finding the smallest eigenvalue of an arbitrary matrix is of far- 
reaching significance. It may happen that we are interested in one 
particular eigenvalue of A which is neither the largest nor the smallest. 
We have, however, a fairly good approximation A, of 4 at our disposal. 
Hence we know that the desired å differs from 4) by a small amount 
e only. The question is how to obtain e. 

The boosting up of one particular eigenvalue is not an easy task, 
since in the customary iteration techniques the larger eigenvalues 
will inevitably overshadow the effect of the critical one. The following 
method is free of this difficulty. 

Since 


A=Ipte (3-10.1) 


the matrix A — åI — el has a zero eigenvalue. But this means that 
the matrix 
Ay = A — Al (3-10.2) 


has the small eigenvalue e. This € is generally complex even if Ay 
was given as real. It will thus be necessary to obtain a quadratic 
approximation of e. 

We will again think of the reciprocal matrix A>! and try to obtain 
its largest eigenvalue by the method (3.9). For this purpose, however, 
we have to start with a trial vector bọ which is strongly biased in 
favor of the largest eigenvalue. Now we argue as follows: if e is 
neglected and A, has the eigenvalue zero, then the symmetrized 


204 Large-scale Linear Systems Chap. III 


Hermitian matrix Sy = AA, has also the eigenvalue zero. Moreover, 
the eigenvector associated with the zero eigenvalue is common to both 
matrices. But this means that even if € is not zero but small, the 
eigenvector x of Sy associated with the smallest eigenvalue will have a 
strong component in the direction of the desired eigenvector of Ap. 

Let us therefore obtain the smallest eigenvector of the Hermitian 
matrix Sy, employing the method of the previous section. 


Soto = Po%o (3-10.3) 


We either boost up an arbitrary random vector by several C iterations 
and consider the resulting vector directly as our x), or we make use of 
the more refined quadratic approximation (9.9). At all events we 
have now a Starting vector bọ = 2%). Now we need a similar starting 


vector for Ay. The symmetrized matrix here is another Sy, namely, 


Sj = AgAy (3-10.4) 
But now the equation (3), if we premultiply by Aj, gives 
AACA = PoAc%o (3-10.5) 
and changing i to —i, 
(A940) 45% = PoAoto (3-10.6) 
This shows that 
Ty = Ap% (3-10.7) 


We now have a pair of starting vectors. If pg is very small, a linear 
approximation will suffice to obtain e. In this case we get directly 


eh * 
Ly AXy LyX 
E ne 


(3-10.8) 


To'o ° ty AT 
But if pọ is not very small, we proceed to the construction of z on the 
basis of Ax, = 2, which means 


solvable by the previous iteration technique. For this purpose we go 
with N high enough to move pp in the safe zone: py => A, (cf. 7.33). 
Then we proceed as follows. We consider our trial vector 2,2» 
not as bobo, but as the middle pair b,,b,. Then b, = z, while 


by = Apto = Š% by = Act (3-10.10) 


§10 The Smallest Eigenvalue of an Arbitrary Matrix 205 


[The last vector is replaceable by pox, if (3) is exactly fulfilled. 
However we will not depend on the exact fulfillment of (3).] 
We now have the five vectors 


(<3, by) (Eo) z (3-10.11) 


and we can form the four basic scalars, 


“kF Zabo 
oTo 
Ci = Lobo = To 
Be ~ LoL o (3-10.12) 
C= Toto = boty = a= zz, 
0 
ce Lito 
0 To 
together with the quadratic equation 
ee l 
Yı 1 Vo = 0 (3-10.13) 
l Y2 Ys 


The terms with y, are frequently negligibly small, in which case our 
equation is reduced to 


ely — yz) + eya —1=0 (3-10.14) 


Here we have the correction in second approximation, which has to 
be added to the crude preliminary value A, of the eigenvalue 4.? 


Bibliographical References 


[1] Horvay, G., Solution of Large Equation Systems and Eigenvalue 
Problems by Lanczos’ Matrix Iteration Method (General Electric 
Company, Report No. KAPL-1004, Technical Information 
Service, Oak Ridge, Tenn., 1953). 


[2] HOUSEHOLDER, A. S., FORSYTHE, G. E., and GERMOND, H. H., 
Monte Carlo Methods (Nat. Bur. Standards, Washington, D.C., 
1951, Applied Math. Series 12). 


+ An interesting application of this method to the solution of algebraic 
equations had to be omitted, for the sake of economy. 


206 Large-scale Linear Systems Chap. III 


Articles 


[3] Hestenes, M. R., and KARUSH, W., “A Method of Gradients 
for the Calculation of the Characteristic Roots and Vectors of a 
Symmetric Matrix,” J. Research, Nat. Bur. Standards, 47, 45 
(1951). 


[4] Hestenes, M. R., and STIEFEL, E., “Method of Conjugate 
Gradients for Solving Linear Systems,” J. Research, Nat. Bur. 
Standards, 49, 409 (1952). 


[5] Lanczos, C., “An Iteration Method for the Solution of the 
Eigenvalue Problem of Linear Differential and Integra] 
Operators,” J. Research, Nat. Bur. Standards, 45, 255 (1951). 


[6] Lanczos, C., “Solution of Systems of Linear Equations by 
Minimized Iterations,” J. Research, Nat. Bur. Standards, 49, 33 
(1952). 


IV 


HARMONIC ANALYSIS 


1. Historical notes. The discovery of the acoustical importance 
of a fundamental vibration and its overtones is attributed to the 
Greek sage Pythagoras (600 B.c.). The mathematical significance of 
analyzing a periodic function in terms of functions of the type sin kx 
and cos kx, where k is an integer, was recognized in the eighteenth 
century by Euler and Lagrange. Since eighteenth century mathe- 
matics did not possess the exact notion of a limit, the true nature of 
an infinite series was not fully realized. Lagrange assumed that any 
superposition of analytical (i.e., infinitely differentiable) functions 
must again give an analytical function. Hence he restricted the 
possibility of a harmonic analysis to functions which were not only 
continuous but which could be differentiated any number of times. 
It was Fourier’s brilliant discovery (‘Théorie analytique de la 
chaleur” (1822)] that such a restriction is not demanded. The 
function may be entirely “unpredictable,” i.e., composed of an 
arbitrary number of arcs which change their analytical law constantly. 
Nor is the continuity of the function necessary. Fourier gave 
examples for harmonic analysis of functions which had a finite 
number of discontinuities in the given fundamental interval. This 
fundamental interval, usually normalized to the range from —7 to 
+7, is all we need for the definition of the function y = f(x). The 
periodicity of the function is accidental and enters the picture only if 
we leave the fundamental interval. The theory of harmonic analysis 
need not leave the given fundamental range. Another fundamental 
discovery of Fourier was the “Fourier integral’ by which he 
generalized the methods of harmonic analysis to an infinite interval 
which ranges from —oo to +00, without requiring any periodicity 
for the function to be analyzed. Later Dirichlet investigated more 

207 


208 Harmonic Analysis Chap. IV 


specifically what conditions the “arbitrary function” of Fourier had 
to satisfy in order to be representable by a harmonic series. These 
conditions, called the “Dirichlet conditions,” are sufficient for the 
convergence of the Fourier series; they are not always necessary, 
however. In recent times (1904) Fejér invented a new method of 
summing the Fourier series by which he greatly extended the validity 
of the series. Using the arithmetic means of the partial sums, 
instead of the partial sums themselves, he could sum series which 
were divergent in themselves. The only condition the function f(z) 
still has to satisfy is the natural condition that the function shall be 
absolutely integrable. 


2. Basic theorems. Let the function y = f(x) satisfy the following 
conditions: 
1. f(x) is defined at every point of the interval 


rn <x < +r 


2. f(x) is everywhere single valued, finite, and sectionally con- 
tinuous. Hence f(x) can have a finite number of discontinuities only, 
and two consecutive discontinuities must be separated by a finite 
interval. 

3. f(x) is of “bounded variation”; this means that f (x) cannot have 
an infinite number of maxima and minima in the given interval. 

These conditions imposed on f(x) are called the “Dirichlet 
conditions.” A function of this type can be expanded into a con- 
vergent infinite series of the following form. 


Je = $d) + a, COS x + aa cos 2x + + 
+b sing + basin 2x + + 


where the coefficients a,, b,, called the “Fourier coefficients,” are 
defined as follows. 


(4-2.1) 


If" 
a, = an f(x) cos kx dx (4-2.2) 


E G 
b, = — f(x) sin kx dx 
T yr 


At a point of discontinuity x = a we will define f (a) as the arithmetic 
mean of the two limiting ordinates. 


f@= Hla) +a (4-2.3) 


§ 2 Basic Theorems 209 


This is the value to which the infinite Fourier series converges at 
the fixed point x = a. 

A convenient alternate form of the Fourier series can be given in 
the following complex form: 


f(x) = Co + ae + ege + + 


+ ege? + ege? 4 + (4-2.4) 
+ 00 
= » ce 
k=- 00 
where 
Í k keg 4-2.5 
= — ~ dt =k 
Cy a = Se ( ) 


Proof. If the existence and convergence of the series (1) is postu- 
lated at every point of the given range, we can immediately multiply 
on both sides by cos kx or sin kx and integrate from the lower limit 
—7 to the upper limit +r. Then the expressions (2) follow 
immediately for the expansion coefficients a,, b, because on the right 
side everything cancels out except the single term ‘containing cos kx 
or sin kx. So far the results are simple, and were historically well 
established long before Fourier. But are we allowed to assume the 
validity of the expansion (2.1)? In order to answer this question, 
Dirichlet proceeded as follows. We start with the finite series 


Fil) = $9 + a, cos x + + + a, cos nx 
+ by sin + + + bp, sin nx (4-2.6) 


assuming for ay, by their values (4-2.2), and investigate what happens 
if n increases to infinity. We obtain 


+a 
Oe ll SOKE — t) dt (4-2.7) 
where K,„(&) is the so-called ‘“‘Dirichlet kernel”: 
sin (n + 3)& 
27 sin (46) 


Our aim is to show that with n increasing to infinity, {,(7) approaches 
f @) with an error which can be made arbitrarily small. This 
requires a very strong focusing power of the function K,,(&). It must 


K,(&) = (4-2.8) 


210 Harmonic Analysis Chap. IV 


put the spotlight on the immediate neighborhood of £ = 0 and blot 
out everything else. In particular, the following two conditions 
should be expected as the exact mathematical expressions of the 
increasing focusing power of K,(¢) = K,(—&): 


lim f i | K, | dé = 0 (4-2.9) 
+E 
lim K,(é) dé=1 (4-2.10) 


The first condition guarantees that the kernel function blots out 
everything except the immediate neighborhood of the point t = zx. 
The second condition guarantees that the point t= 2 enters the 
integration with the proper weight; eis assumed to be a prescribed 
arbitrarily small positive number, independent of n. 

However, the Dirichlet kernel does not satisfy the first condition. 
The secondary maxima of the function (2.8) are not small enough to 
ensure predominance of the point = 0. This is the reason that 
the expandibility of f(x) into an infinite convergent Fourier series has 
to be restricted to a definite class of functions which are sufficiently 
smooth to counteract the insufficient focusing power of the Dirichlet 
kernel. The Dirichlet conditions 2 and 3 given above aim at 
establishing the desired smoothness of f (x).! 

Fejér’s method. By an ingenious modification of the summation 
procedure, L. Fejér succeeded in increasing the focusing power of the 
Dirichlet kernel and thus extending the validity of the Fourier series 
to a much larger class of functions. We will not restrict f(x) by any 
conditions, except the single condition that f(x) is “absolutely 
integrable,” which means that the quantity 


+r 
h= f JŒ | dz (4-2.11) 


exists (“‘absolutely integrable functions”). This condition is entirely 


1 The Dirichlet conditions are sufficient but unnecessarily stringent, although 
they comprise a very large class of functions. The minimum restrictions required 
for the convergence of the Fourier series are not yet known. Fejér’s method 
obviates the problem by summing the series in a different way. 

2 Cf. {15}, p. 169. 


§ 3 Least-squares Approximations 211 


natural, since otherwise the Fourier coefficients a, and b,, defined 
by (2), or c,, defined by (5), would not exist either. 

We can now construct the formal series (1) or (4). Generally, these 
series will not converge. This means that the sequence /,(z), 
defined by (6) taken at a definite x, will not approach a definite 
number as n increases to infinity. We call these /,(z) the “partial 
sums,” since we have summed the series up to a certain point. We 
will change our notation and call them s,(x). We substitute a 
definite value for x and consider the sequence 


Soo Sy, Sq, Sg, ° 


Out of this sequence we construct a new sequence by taking the 
arithmetic means of the previous sequence: 


Sots, Sy HS H + Saa 
Jam = Bl ae 
This new sequence has the remarkable property that it converges to 


a definite limit at all points at which the one-sided limits f = f(a,) 
and f, = /(x_) exist, defined by 


lm y@ + 2) —fl=0 


(4-2.12) 


(e>0) (4-2.13) 
limf fæ — e) — fy] = 0 


At all such points f,(x) converges to the arithmetic mean of the two 
limits (13). [In an ordinary point the two limits coincide, and the 
series simply converges to f(x)]. Hence f,(x) automatically con- 
verges to a reasonable value at all points at which a reasonable 
answer can be expected at all. 

Fejér’s results come about by the fact that his method is associated 
with the following kernel function. 


sin? (né/2) 
2an sin? (é/2) 


This kernel possesses the strong focusing properties expressed by the 
conditions (9) and (10). 


K,(§) = (4-2.14) 


3. Least-squares approximations. While the interest of pure 
analysis is essentially centered around the general convergence 


212 Harmonic Analysis Chap. IV 


properties of the Fourier series, the interest of practical analysis is 
centered around the possibility of representing a fairly large but not 
too irregular class of functions effectively by a relatively small 
number of harmonic components. Here it is no longer a question of 
obtaining the given y = f(x) with an arbitrarily small error but of 
approximating f (x) with a finite error which, however, shall be made 
as small as possible. The Fourier series belongs to that class of 
expansions which approximate a given function y= f(x) in a 
definite finite interval uniformly well, in the sense that we do not try 
to make the approximation very close in the neighborhood of 
a certain point at the cost of obtaining a large error away from that 
point, but we attach equal importance to every point of the region. 

The principle involved in this kind of interpolation is given by the 
“method of least squares.” We define the error 7(x) of a finite 
expansion by forming the difference between the given y = f(x) and 
the finite sum 


Sn) = C19 (2) sg CoP2(2) oe ae Cm Pm(*) (4-3.1) 


where 9,(x), pa(£), * , (x) are prescribed functions, while the 
coefficients c4, Cy, °° , Cm are at our disposal. We call the difference 
Nm(%). 

Instead of trying to let n„(x) vanish at m more or less arbitrarily 
prescribed points, we characterize the “average error” by integrating 
over the finite range 

ax<ur<ib (4-3.3) 


in which the function f(x) shall be represented. We define this 
average error by the definite integral 


1 b 
z2 2 
a Era N(x) dx (4-3.4) 
and we determine the coefficients c, of the expansion (1) by the 
condition that 7 shall become as small as possible. This problem has 
always a definite solution, since an essentially positive quantity 
always assumes a definite absolute minimum for a certain choice of 
the variables. 
Our problem is greatly facilitated if the given basic functions 


p(x), p(x), ta P(X) (4-3.5) 


§ 3 Least-squares Approximations 213 


satisfy the following “‘orthogonality condition” 
b 
[vn ae = 0 (4-3.6) 


A set of functions which satisfies this condition is called an “ortho- 
gonal set.” If in addition we apply for every y,(x) such a factor of 
proportionality that we shall have 


Kao dx = 1 (4-3.7) 


va 


we speak of an “orthogonal and normalized” or briefly “ortho- 
normal” set of functions. 

Let us observe that for such a function system the quantity f7(2), 
integrated over the given range, appears in a particularly simple form. 


b 
[R@a=dtdt-+6 (4-3.8) 

a 
Furthermore, the integral of 72,(x) becomes likewise greatly simplified. 


b b 
Í nalz) dz = { JŒ) du — AYyey + Yaca + °° + Ymlm (4-3.9) 
+++ e+e 


where 
b 
n= | JONO ae (4-3.10) 
Now the expression (9) is a simple algebraic function of the vari- 
ables c,, and the condition of minimum requires 
C= Yi (4-3.11) 


With this choice of the c, the average error 7,, of the approximation 
becomes 


l : | 
tim = pz p f Œ) de — (yi + y+ + 8) | (4-3.12) 


If the given f(x) happens to be a linear combination of the m given 
y,(x), the choice (11) of the c, will give an expansion (1) which 
coincides with f(x) identically. In that case #,, will turn out to be 


214 Harmonic Analysis Chap. IV 


zero. For any other choice of f(x), however, 7%, will remain a 
positive quantity which can be further reduced only by adding more 
functions 9,,,;(%), © to the previous set. 

The great advantage of an orthogonal set of functions is that 
addition of a new member to the set has no damaging influence on 
our previous calculations. We need not change anything on the 
previous coefficients, but merely determine the additional coefficient 
Cm1) Which depends solely on m(x). The average error 7,,1 is 
now further diminished, since —y2,, is added inside the bracket of 
the right side. If y,,,, happens to come out as zero, this indicates that 
the addition of ¢,,,,(x) was not able to further decrease the average 
error; then 9,,,,(x) has to be tried, and so on. 


4. The orthogonality of the Fourier functions. The functions of 


the Fourier series give us an example for an orthogonal set. The 
range is from —7z to +7, or from 0 to 27. 


see he i 
f efm® e”? dy = (a) fe +| =0 (m-~—n) 
(4-4.1) 


-r 


However, for functions which assume complex values, the concept of 
orthogonality has to be slightly modified. Here we replace the 
quantity 72(x) by 7,,(x)n,(x), where the * refers to the operation: 
“change i to —i.” Accordingly, orthogonality shall mean 


[vere d=0 Gb (4-4.2) 
and normalization of the 9,(x) occurs by the condition 
OTOLE (4-4.3) 
Furthermore, the y, are here defined by 
n= [IO ae (4-4.4) 


and these y, become the coefficients of the orthogonal expansion 
which minimizes the average error. 


§ 5 Separation of the Sine and the Cosine Series 215 


The ortho-normal functions of the complex Fourier series are the 
functions 


Qy = ae etkx (4-4.5) 


and we obtain 
l +r , 
Yk = Von E fae dx (4-4.6) 


The resultant series is equivalent to the previous complex form (2.4), 
(2.5) of the Fourier series. 

We observe that the infinite Fourier expansion (2.4) can be 
conceived as the process of constantly minimizing the average error, 
by taking in more and more orthogonal functions. One can show 
that with n growing to infinity the average error [cf. (3.4)] converges to 
zero for any “quadratically integrable function” of the range, i.e., for 
any function for which the integral 


+r 
R=| |f@P ae (4-4.7) 


exists. And yet the peculiar phenomenon occurs that from this fact 
we cannot draw conclusions as to the vanishing of the local error 
n(x) itself. If the average error vanished exactly, this would necessarily 
mean the vanishing of 7(x) at every point, since the integral of an 
everywhere nonnegative function can vanish only if that function 
vanishes identically. This would prove that the Fourier series gives 
the correct functional value at every point of the range. What we can 
prove, however, is only that the average error can be made as small as 
we wish, by increasing n to a properly large value. This is not enough - 
to prove that the local error itself converges to zero at every point of 
the range. For this purpose more has to be required of the function 
f(x) than quadratic integrability (for example, the Dirichlet 
conditions, cf. §2). | 


5. Separation of the sine and the cosine series. For many purposes 
of applied analysis it is convenient to separate the sine part and the 
cosine part of the Fourier series. This can be done by separating the 
function f(x) into an even and an odd part. We put 


gz) = 21/@ + f(-)] (4-5.1) 
h(x) = HSE) — f(—2)] 


216 Harmonic Analysis Chap. IV 
Then 


SE) = g(a) + h(a) (4-5.2) 

with 
g(—2) = g(x) (4-5.3) 
h(—x) = —h(x) (4-5.4) 


A function with the symmetry property (3) is called an “‘even,” 
one with the symmetry property (4) an “odd” function. In the first 
case the sine part of the Fourier series drops out, in the second case 
the cosine part. In both cases it suffices to give the function only in 
the range 

O< (4-5.5) 


which is the half range of the regular Fourier series. In the other 
half the reflection properties already define the function. 
We now obtain 


g(x) = 4a, + q cos x + ag cos 2x + + (4-5.6) 

with. 
a, = A g(x) cos kx dx (4-5.7) 

and 

h(x) = b, sin x + bg sin 2x + + (4-5.8) 

with 
b; = - Í fŒ) sin kx dx (4-5.9) 

7 J0 


Another point of departure is to assume that the function f(x) 
is from the very beginning defined in the interval (5) only. In that 
case we have complete freedom to define the function for negative 
values of x either by the condition (3) or by the condition (4). Hence 
the very same function f(x) of the interval [0,7] can be expanded by 
either a cosine or a sine series. Both expansions are complete in 
themselves and converge to the given f (x), provided that f (x) belongs 
to the class of functions covered by the Dirichlet conditions. How- 
ever, the convergence of these series may be vastly different. Very 
frequently we need the Fourier expansion of a function which is 
everywhere continuous and even differentiable in the given interval. 
In this case the convergence of the Fourier series will be determined 
by the properties of the function at the boundary points x = 0 and 


§ 5 Separation of the Sine and the Cosine Series 217 


x == m. If the value of f(z) is not zero at the two boundary points, the 
reflection as an odd function (4) will cause a discontinuity at the 
two points x = 0 and «= ~r. The discontinuity at x = 0 follows 
directly from the law of reflection (4), the discontinuity at x = m from 
the periodicity condition 


fle + 2m) = f(@) (4-5.10) 
which, if applied to the point x = —7 demands 
SE) =fr) (4-5.11) 


Both discontinuities are avoided if f(x) is reflected as.an even 
function. Then the discontinuity appears in the first derivative only 
but not in the function itself. Hence expanding a function which is 
not restricted by any particular boundary conditions, the cosine 
series (7) will have much better convergence properties than the sine 
series (9); the latter expansion will show the undesirable oscillations 
due to the “Gibbs’ phenomenon” encountered in the neighborhood 
of a discontinuity; [cf. § 9]. 

On the other hand, let us assume that f(x) vanishes at the two 
end points of the given range. Such a condition can always be met 
if it is permissible to subtract from f(x) a linear trend of the form 
a + fx, with suitably chosen « and p. The new function satisfies 
the boundary conditions 


f0O=0, f(m=0 (4-5.12) 


For a function of this type the sine series has much better convergence 
than the cosine series, since the reflection as an odd function now 
preserves continuity in function and first derivative, the discontinuity 
appearing only in the second derivative, at the points x = 0 and 
v= TT. 

We can estimate the order of magnitude of the Fourier coefficients. 
In an integra] of the type 


| £@) cos ke de (4-5.13) 


not much can be said as long as k is small. For large k, however, the 
first factor is smooth compared with the rapid oscillations of the 
second factor. Hence we can integrate by parts, obtaining 


f(x) sin kx 


7 -3 {re sin kx dx (4-5.14) 


fy (x) cos kx dx = 


218 Harmonic Analysis Chap. IV 


Repeating the process once more, 


fO sin ke de = POPE Trey cos kede (45.15) 


Putting in the limits 0 and 7, the dominant term for large k becomes 


[ro cos kz dr = pa f'(x) cos kz = ca a BS 


while — (4-5.16) 
f Fa) sin kx dx = — : = sh eek ASL 


0 


JS (x) cos kx 


(4-5.17) 


This shows that the coefficients of the cosine series decrease with 
the speed k-®, the coefficients of the sine series only with the speed 
k~, if f(x) does not satisfy any specific boundary conditions. 

If, however, f(z) satisfies the boundary conditions (12), the right 
side of (17) vanishes, and we obtain for the dominant term, 


7" (DS) —f"O 
o k? 


(4-5.18) 


{ F(x) sin kx dx = 5 f" (®) cos kx 


The speed with which the coefficients decrease is now k~?, which will 
be satisfactory for many applications of the Fourier series. 

We can transfer our results from the range [0,7] to the range 
[—7,7] by calling the previous variable x, and transforming it to 
the new variable 


t= — T + 2X1 (4-5.19) 


and finally returning to the notation x. We thus obtain for a function 
f(x) which satisfies the boundary conditions 


f(t7)=0 (4-5.20) 


the expansion 


xt 


3 
f@m=aqy cos = + A», COS > + + + a, cos 


+ b sin x f+ by sin 2x + - + a, sin kx + + (4-5.21) 


§ 6 Differentiation of a Fourier Series 219 


with 
+7 Z 
a, = * | Fæ cos a : x dx 


oe (4-5.22) 
b= = Í f(x) sin kx dx 
T J—an 


The series (21) has faster convergence than the original Fourier 
series (2.1) if the function f(x) satisfies the boundary conditions (20); 
its coefficients decrease with the speed k~’, instead of the speed k~? 
which holds if the same function is analyzed in the original Fourier 
functions. 


6. Differentiation of a Fourier series. Let us consider the finite 
expansion 


m—1 
fanla) = c,e"** (4-6.1) 
together with the residual 
Nala) = > (ere + cpet) (4-6.2) 


k=m 
The residual may be written as follows. 


co 


Nm(X) = etms > A + e-ime 


k=0 


Come" (4-6.3) 


Ms 


It suffices to consider the first term of the right side only. We can put 


> ee 0) (4-6.4) 
k=0 


Then p,,(x) is a generally smooth function which does not show any 
rapid oscillations. On the other hand, e*”* is a rapidly oscillating 
function. Hence the error term of the Fourier series has the character 
of a “modulated carrier wave” of high frequency. If we formally 
differentiate the truncated series (1) and compare it with the value of 
f'(Œ), we obtain the large error, 


Nm(&) = ime’™*p, (x) + ep) (x) — ime p mE) + ep mla) 
(4-6.5) 


220 Harmonic Analysis Chap. IV 


The primed terms arise from the differentiation of the modulation 
and do not cause any serious difficulty. However, the other 
terms contain the large factor m. It is this fact which causes a 
serious loss of accuracy in formally differentiating the Fourier series. 
We have differentiated the modulation and the carrier wave. We 
should differentiate the modulation only, without the carrier wave. 
This aim can be achieved by the following procedure. Let us 
replace the process of differentiation by the following differencing 
process: 
g, y [E + mlm) — fe — alm) mr 
: 27/m 
The operator 2 does not coincide with the operator D of ordinary 
differentiation but the difference gets smaller and smaller as m 
increases to infinity. In the limit 


D= é = lim 2, (4-6.7) 


mM— R 


Now the operator 2 m, if applied to 7,,(x), picks out two points on 
the carrier wave which are exactly in the same phase, viz., +180° 
away from the phase at the point x. For this reason the operator @,, 
applies merely the factor —1 to the carrier wave, without differentiat- 
ing it. We obtain 


7 mNm(T) = — e™p mPm(2) —e MD mP—mi*) (4-6.8) 


Now p,,(£);P—m(%) are smooth functions whose differentiation does 
not cause any increase in the order of magnitude of the error. 
Consequently the convergence of the Fourier series was not damaged 
by this operation. In the limit, as m grows to infinity, the derivative 
of f(x) is obtained at all points where this derivative exists. 

This method of differentiating the Fourier series is equivalent to 
multiplying the coefficients of the formally differentiated series by 
certain preassigned weight factors o, which depend on m, the order of 
the partial sums. Consider the term associated with e***. 


sin mk/m eike — 
a/m aim 


sin wk/m | 


Gert = ike*** (4-6.9) 


§7 Trigonometric Expansion of the Delta Function 221 
The correction factor is thus 


sin k/m 
k e i 


(4-6.10) 
alm 

Since the same o, holds for both +k, the same factor is applied to 
both c, and c_,. Moreover, the a,,b, coefficients of the real form of 
the Fourier series are linear combinations of c, and c_,. 


Ay = Cy F Cys by, = (Cp — Cx) (4-6.11) 


Hence the coefficients a, and b, receive likewise the same “‘attenua- 
tion factors” o,. The factor o, is 1 only for k = 0; for increasing k 
the o, decrease monotonously and become almost zero for the 
highest subscript k = m — 1. 

The strong attenuation of the Fourier coefficients of high order 
successfully counteracts their tendency to make the series divergent. 
Hence a series which would otherwise be completely divergent can 
be made convergent by application of the o,-factors. 


7. Trigonometric expansion of the delta function. A good example 
is provided by the “‘square wave,” defined by the conditions 


f(—2%) = —f@ 
{@M=4 O<r<7) (4-7.1) 
fO=f@)=0 
The corresponding Fourier series, 
we 2 (sin eae — z | (4.12) 


converges at every point of the interval [—7,+7] to f(x), but the 
convergence is slow. If we differentiate the given f(x), we obtain 


f'(@)=0 (4-7.3) 


at every point, except at x = 0 where the derivative does not exist. 
The formal differentiation of the Fourier series, on the other hand, 
yields 


: 2 
y'(x) = = (cos x + cos 3x + cos 5x + ---) (4-7.4) 


222 Harmonic Analysis Chap. IV 


This series does not converge to a definite limit at any point x 
(except at x = +7/2). Hence the strong singularity at the point 
x = 0 irradiates its effect to every point of the region and destroys 
the convergence of the Fourier series everywhere, although f’(x) is 
entirely regular at every point, except at the origin x = 0. 

If we now replace y’(x) by the series 


__ 2{sin (r/2m) sin (377/2m) 
Yom) = 2 cen [2m cos x + “nlm cos 3x 
+e + a cos (2m — e| (4-7.5) 


we get an entirely different behavior. As m grows to infinity, y,,,(x) 
converges to zero at every fixed point x, except the point x = 0, 
where the function grows beyond all bounds. The function grows 
to infinity in such a way that the area under the curve, taken between 
the points x= +e/2, where « is an arbitrarily small but fixed 
quantity, converges to 1. 

The derivative of the function (1) is frequently called the “delta 
function,” and is denoted by d(x). It is not a legitimate function, 
since the original function cannot be differentiated at the point 
x = 0. It is, however, the limit of a legitimate function and useful 
for many analytical purposes. We define this function as zero 
everywhere, except between the limits +€/2, where £ tends toward 
zero. At x = 0 the function goes to infinity in such a way that the 
area under the function shall be 1. 

In physical interpretation the delta function represents a pulse of 
the intensity 1, applied during a time interval € around the point 
x = 0, letting £ converge to zero. The formal definition of the delta 
function leads to the infinite series 


y= -q + cos x + cos 2x + =) (4-7.6) 


which does not converge at any point. If, however, the o-factors are 
applied and we consider the expansion 


Ym = Í (4 + o cos x + o, cos 2x + e + Om cos(m — 1)z] 
TT 
(4-7.7) 


§7 Trigonometric Expansion of the Delta Function 223 


where the o, are defined by (6.10), we obtain an expansion which can 
be considered the trigonometric representation of the delta function. 
The following figure plots the course of the function thus obtained. 


In this figure the variable £ represents the product 


ela (4-7.8) 

TT 
Hence the points +1 of the figure are the points +7/m in terms of z. 
Moreover the amplitudes of y(&) are in the following relation to 


Ym(X)- 
m 
Yn = my(é) = my(™ 2] (4-7.9) 
As m increases, the figure is reduced to smaller and smaller portions 


of the x axis, while the maximum amplitude gets larger and larger. 
The figure shrinks by the factor m in the x direction, while it grows 


224 Harmonic Analysis Chap. IV 


by the factor m in the y direction. As m grows to infinity, y,, con- 
verges to 0 at every point x, except at x = 0. At the same time the 
area under the curve, evaluated with the help of the series, remains 
constantly 1. 


8. Extension of the trigonometric series to nonintegrable functions. 
We have seen that the absolute integrability of f(x) had to be 
demanded in order to expand y = f(x) into a Fourier series. Hence 
y = log 7v is a function which between 0 and 7 can be expanded into 
a cosine (or a sine) series because the integral of the function is 
bounded, although the function itself becomes infinite at x = 0. On 
the other hand, the function y = x~? cannot be expanded, since now 
the function goes so strongly to infinity that even the area under the 
curve becomes infinite. The Fourier coefficients a,,b, go out of 
bound for such a function. 

However, x~} can be considered as the derivative of log x. If we 
have a procedure by which we can differentiate a convergent Fourier 
series without losing its convergence, then we can first expand 
y = log x into a Fourier series and then by differentiation obtain 
x-1. This can be done by application of the o, factors. The formal 
derivative of the series for log x does not converge at any point. 
But if the formal coefficients are multiplied by o,(m), then with 
increasing m the modified series converges to the function x71, and 
the error can be made arbitrarily small, with only the exception of the 
point x = 0, where the derivative does not exist and the function 
grows to infinity. The cosine series likewise grows to infinity at this 
point, while the sine series remains constantly zero. And yet we can 
choose an arbitrarily small e, and the sine series will converge to the 
very large value 1/e, in spite of the fact that it has to start from the 
value zero at the point x = 0. 

If the same process is applied several times, we have a simple 
method of obtaining trigonometric expansions for functions which 
are not integrable in themselves but which are the wth derivatives 
of an absolutely integrable function. In this case we start with the 
generating function, obtain its Fourier series, differentiate it u 
times formally, and then apply the factors of to the formal coefficients 
of the differentiated function, The modified series will converge to 
the uth derivative of the original function at every point where this 
derivative exists. By this method even a function of the type x~“ 


§ 9 Smoothing of Gibbs Oscillations by the o Factors 225 


with arbitrarily large positive u can be expanded into a trigonometric 
series and the error can be made arbitrarily small at any point 
excluding the origin, although the classical Fourier series does not 
exist for these functions. 


9. Smoothing of the Gibbs oscillations by the o factors. Application 
of the o factors can serve not only to transform a divergent Fourier 
series into a convergent series, but also to increase the slow con- 
vergence of a Fourier series. Since any function can be conceived as 
the derivative of its integral, the reduction of the error in the deriva- 
tive by the factor m must show its beneficial effect on the convergence 
of any given Fourier series. 

The slow convergence of a Fourier series is particularly undesir- 
able if a point of discontinuity is involved. The effect of this discon- 
tinuity is that the series oscillates around the true function with 
amplitudes which decrease only very slowly with increasing number 
of terms. The high-frequency oscillations of the truncated series 
around the true function are always present, as the error term (6.3) 
has shown. Ordinarily, the amplitudes of these oscillations are 
sufficiently small to be of no major consequence. In the case of a 
discontinuity, however, they are very noticeable and interfere with 
an efficient harmonic synthesis of the square wave.! 

Fejér’s arithmetic mean method completely eliminates the Gibbs 
oscillations. The approach to the square wave now occurs in the 
form of completely smooth, monotonous functions which remain 
constantly below the curve. The method of the o factors does not 
eliminate the oscillations of the Fourier series but cuts down their 
amplitudes. The former amplitude of 0.08949 of the jump in the 
first oscillation is reduced to 0.01187; the next minimum of 0.04859 
is reduced to only 0.00473, and so on. 

The accompanying figure plots the course of the original truncated 
Fourier sum with the Gibbs oscillations, and likewise the course of 
the arithmetic mean and the result of the o smoothing. We notice 
that the method of the arithmetic mean, while avoiding the undesir- 
able oscillations of the Fourier series, has the disadvantage that it 
approaches the asymptotic value very slowly and that it starts with a 
relatively small slope. The method of the o factors yields a faster 
increase of the approximating function, together with a sharp turn 


1 The “Gibbs phenomenon”; cf. [2], Chapter IX; see also {1}, p. 105. 


Chap. IV 


Harmonic Analysis 


226 


QOHLSW NV3W SILSWHLIYV=W 
ol Ol 8 


“3AVM 3YVNOS G3SHLOOWS =S 
9 b 


“SAVM 3YVNOS AWNISINO = 4 
Z O 


§ 10 General Character of the « Smoothing 227 


after the maximum level has been reached. It is true that the oscilla- 
tions still exist, but their amplitude is strongly attenuated and quickly 
damped out. The fidelity of the approximation is thus markedly 
better than that of the arithmetic mean method, while in comparison 
with the original series we can state that we have practically elimi- 
nated the cumbersome oscillations of the Gibbs phenomenon, 
at the cost of a somewhat less steep ascent at the beginning of the 
curve. 


10. General character of the o smoothing. Application of the 
o, factors to the classical Fourier coefficients is a somewhat simpler 
process than taking the arithmetic mean of the partial sums, and is 
less damaging from the fidelity point of view. The power of this 
smoothing process is equivalent to that of the arithmetic mean 
process. One can show that the outstanding properties of the 
Fejér’s kernel (2.14) hold also for the kernel associated with the 
ø process. The multiplication of the classical Fourier coefficients 
by the factors o, has thus the same effect on the convergence 
of the series as the taking of the partial sums; convergence is 
obtained at all points where a definite limit can reasonably be 
expected. 

In contrast to the summing by partial sums, the method of the 
o factors is not merely a technical device for summing an infinite 
series in a definite manner but has an invariant significance in 
relation to the given function f(x). Replacement of the D-process by 
the Z,, process of § 6 can be approached from an altogether different 
viewpoint. Instead of changing the process of differentiation we will 
now change the function involved in the operation but leave the 
operation itself unchanged. We can write 


DJ (2) = Df (x) (4-10.1) 
m [trm 
where f@=— f(x+ dt (4-10.2) 
27 — nm 
This means that the function f(x) is replaced by a new function f(x) 
which smooths the original function by taking at every point the 
arithmetic mean of f (x) around the point x, between the limits +-2/m. 
Instead of saying that we multiply the coefficients of the truncated 
Fourier series by o,, we can also say that we operate with the truncated 


228 Harmonic Analysis Chap. IV 


Fourier series of a modified function f(x) which is associated with 
f(z) in a definite way, viz., on the basis of local averaging. If m is 
sufficiently large, the local averaging extends to a very small region 
around the point x only, and thus the distorting effect of this opera- 
tion is negligibly small, except in the neighborhood of a singularity, 
where the smoothing of the original roughness of the function is 
actually desirable. 


We thus see that at the cost of a relatively small distortion of the 
function the convergence of the Fourier series can be considerably 
improved in all instances in which the convergence of the original 
series is not satisfactory. At a regular point of the function the 
distortion thus introduced can be characterized in terms of the second 
derivative. By the Taylor series we have 


2 
feth=fO +F OEO (103 


§ 11 The Method of Trigonometric Interpolation 229 


Considering h as the variable and integrating between the limits 
+a/m, we obtain 


are 
f@®)={/@+Z5/'@ (4-10.4) 


11. The method of trigonometric interpolation. The harmonic 
analysis of functions is frequently handicapped by the difficulty of 
obtaining the integrals (2.2). Even for functions of relatively simple 
structure, actual evaluation of the integrals (2.2) may meet insuper- 
able difficulties; the integral of the product of a given function with: 
a trigonometric function of the type sin nx and cos nz is seldom 
expressible in terms of simple functions. Hence the question has 
to be raised as to what steps shall be taken for practical evaluation 
of the Fourier coefficients a, and b}. 

A similar problem arises in connection with empirically observed 
functions. The data are available in a finite set of points only, namely 
at the points of observation. In most practical cases the data points 
x, are equidistantly spaced. The same can be said of tabulated 
functions. Here, too, the tabulation usually occurs in equidistant 
steps of the independent variable x. 

It is true that for intermediate points we can frequently interpolate 
the given fundamental data. This is particularly true for tabulated 
functions which can be given with sufficient accuracy to allow an 
effective interpolation at any point x of the given range. We are in a 
less favorable position, however, in relation to empirical functions, 
since here the mathematical law of the function is frequently not 
known; moreover, the “noise” superimposed on the true course of 
the function makes preliminary smoothing of the data necessary, 
which cannot be done with any degree of certainty. 

Under these circumstances we save a great deal in difficulties if we 
can eliminate the interpolation problem altogether by operating 
directly with the given discrete set of data. Very fortunately, 
harmonic analysis is exceedingly well suited to the nature of equi- 
distant data. The numerical procedure for obtaining the harmonic 
components of an unknown function given in equidistant intervals is 
simple and straightforward, and at the same time well convergent. 
The fundamental tool of the Fourier series is thus greatly extended 
in its usefulness since it is applicable to a discrete set of equidistant 
data no less than to a continuous set of data. 


230 Harmonic Analysis Chap. IV 


The general problem of orthogonal expansions with respect to 
discrete data may be formulated as follows. Let a function y = f(x) 
be given at the discrete points x = x, and let the functional values at 
these points be denoted by 


Ya = f (£a) (4-11.1) 


Let us approximate y = f(x) by a linear superposition of given 
functions 9,(x): 


g= > pl) (4-11.2) 
k=1 


We assume that m, the number of given functions, is generally less 
than n, the number of data, but the limiting case m = n shall not be 
excluded. We also assume that the functions q,(x) are linearly 
independent. 

Generally the approximation ğ at the data points x, will not 
coincide with the given functional values y,. We characterize the 
square of the average error of our approximation by forming the 
sum of the squares of the residuals at the data points. x = 2,. 


n n m 2 
y= > Ya — Gu)” = > B = > canted (4-11.3) 
a=1 a=1 k=1 
We determine the “‘best” approximation by minimizing 7? with 
respect to the c,. This demands that the partial derivative of 7? with 
respect to c, shall become zero. 


n k 
> Ẹ = > cane) ,(x,,) = 0 (4-11.4) 


a=1 a=1 
This leads to a simultaneous system of linear equations for determina- 
tion of the c,. The resulting system becomes greatly simplified if the 
given functions p,(x) satisfy the following orthogonality conditions. 


D> PO I=O GFK (4-11.5) 
a=1 


In these conditions we recognize the earlier orthogonality condi- 
tions (3.6) but now translated from the process of integration to the 
process of summation. The geometrical. term “orthogonality” 
refers to the following concept of analytical geometry. Let us plot 


§ 11 The Method of Trigonometric Interpolation 231 


in an imaginary Euclidean space of n dimensions the functional 
values 9(2,), 9(%o), °°, (x,) Of an arbitrary function y = (2), as 
rectangular components of a vector. The function g(x), taken at the 
data points, is thus represented by some vector of an n-dimensional 
space. The given function y = f(x) represents one such vector. The 
m functions (x), pa(£), * » P(x) represent m other vectors, which 
in view of their linear independence include a linear subspace of 
m dimensions. The approximation (2) has the significance of an 
arbitrary vector of this linear subspace. The method of minimizing 
the average error in the sense of minimizing the sum (3) has geo- 
metrically the significance of minimizing the distance of the given 
vector y from the subspace of the base vectors 9, Qo, °°", Øm. This 
means that we project y on the subspace of the 9;; (cf. also V, 16). 

Now the process of projection is greatly simplified if the base 
vectors g, are mutually perpendicular to each other. This means that 
the “dot product” of any two vectors p; and p, vanishes. 


P= > PAIOI=O (GAK) (411.6) 
a=] 
Hence we are back at our previous orthogonality condition (5) but 
now interpreted in the language of geometry. 
As a consequence of this orthogonality condition the equation (4) 
is simplified to 


c; >, pæ) = D, veiled (4-11.7) 


Hence we have obtained an explicit solution of the given least square 
problem (3) in the form 


Ya. Pi(%a) 
—— YP; — ¢=1 (4-11.8) 
OEY; N; 
if we introduce the “norms” of the given vectors p; by putting 
N; = > (æ) (4-11.9) 


Once more we can call our function system g, ‘“‘ortho-normal’’ if 
the norm of each function ¢(x)—i.e., the length square of each base 


232 Harmonic Analysis Chap. IV 


vector g,—is normalized to 1. We then have an orthogonal set of 
base vectors of length 1 which are particularly well suited for analytical 
operations. 

Assuming such an ortho-normal set, the earlier equation (3.8) 
appears now in the following form. 


pe 2 pepe t (4-11.10) 


Moreover, the sum of the squares of the residuals, defined by (11.3), 
now becomes 


Soi > B- (A+++) (4-11.11) 
a=] 
For not-normalized but orthogonal systems the last two equations 
have to be generalized as follows. 


> P=Ne+ Nt +N (4-11.12) 


a=1 


n n 
> Ye- G0 = > n NRH Nach) (4-11.13) 
a=] a=] 

In the limiting case m = n our approximation becomes exact in 
the data points, since now the m-dimensional subspace is extended 
to the entire space. Then the two vectors y and g coincide and we 
have 

y= Ni + Neck + + Nach (4-11.14) 

If the given functions g,(z) have complex values, the previous 
operations have to be modified in the sense that the square g? is to 
be replaced by the product yp* (cf. § 4). Correspondingly in the 
last three equations, c? is to be replaced by c;c}. The orthogonality 
condition (6) now becomes p;p% = 0 and the formula (8) for the 
expansion coefficient c, becomes 


n 
> yapt) 
a=] 


= (4-11.15) 


N,= 2 piad E) (4-11.16) 


§ 11 The Method of Trigonometric Interpolation 233 


We now consider the following fundamental trigonometric 
identity. 
de) 4 e-(n-D L a 1 o(m-1id L Jom? — sin n0 cotan 6/2 
(4-11.17) 


and define our functions 9,(x) as the trigonometric powers 
plz) = e (4-11.18) 


while the position of the data points x, shall be chosen as follows: 
£, = a = [a = —n, —(n — 1), = , (n — 1), n] (4-11.19) 


The range of x is thus normalized to [—7, +7] and divided into 2n 
equal intervals. Hence the number of data points is 2n + 1. In our 
problem, 

PTa) Py (Ea) = 697r (4-11.20) 


If now we sum over « with the understanding that the two limiting 
terms « = +n are taken with half weight, we obtain the left side of 
the identity (17), with 


6=(j— k) = (4-11.21) 


Since j — k is an integer 40 for j Æ k, the right side of (17) vanishes 
and we obtain 
+n 


ee = > oled9E@,) = 0 (4-11.22) 
The prime in È’ refers to the fact that the two limiting terms of the 
sum are taken with half weight. 
The normalization factors of the functions 9,(x) are likewise 
simple. Putting j = k, we obtain 
tn, 
ret = >. EEE) = 2n (4-11.23) 
a=—n 
In conclusion we can say that the least-square solution of the problem 
of fitting the given data by an expansion of the form 


g= > ce (m<n) (4-11.24) 


234 Harmonic Analysis Chap. IV 
is given explicitly by the formula 


lw 
= > ye ike (4-11.25) 


a=—n 
The same expansion can be given more conveniently in real form, 
separating the sine and the cosine terms. Every function can be 
written as the sum of an even and an odd function. 
SO = HE) +f(—2)] + HSE) —f(—2)] (4-11.26) 
Hence it suffices to consider a function g(x) with the symmetry 
property 


g(—2) = g(x) (4-11.27) 

and a function A(z) with the symmetry property 
h(—x) = —h(2) (4-11.28) 
In the first case c_p = Cp, while in the second case c_, = —c,. 


Hence in the first case we obtain 
g(x) = ła + q cos x + + + am cos mx (4-11.29) 
(last term gets weight 4 if m = n) with 


2 © 
n= 2 Z, COS ka - (4-11.30) 
In the second case we obtain 
h(x) = b, sin s + = + bn sin mx (4-11.31) 
with 
I T 
b= 2 ho sin ka = (4-11.32) 


In the limiting case m = n we have enough functions to fit the 
given data exactly. In this case 9(x) is no longer an approximation 
but an interpolation of the given data. We obtain an analytical 
expression in the form of a trigonometric polynomial of lowest 
order which fits the given data exactly and which fits the functional 
values between with a certain accuracy. How great this accuracy 
is depends on the given function. The power of trigonometric 
interpolation lies in the fact that with increasing n the approxima- 
tion g(x) approximates y(x) with ever-diminishing oscillations. For 


§ 12 Interpolation by Sine Functions 235 


every function of bounded variation, the trigonometric interpolation 
converges unlimitedly to the given y = f(x) at every point of the 
given range, as the number of data points increases to infinity. 

This behavior of the trigonometric interpolation is in marked 
contrast to the interpolation of equidistant data by powers. While 
we can always find a polynomial of 2nth order which will fit 27 + 1 
equidistant data of a given finite range exactly, the error oscillations 
between the data points need not have the tendency to diminish in 
amplitudes, as n increases. Around the end of the range, the error. 
oscillations may increase to infinity, thus giving an arbitrarily large 
error everywhere except in the data points. This happens with such a 
simple and regular function as 


l 


in the range between [—1,+1], as shown by O. Runge.? 

The trigonometric kind of interpolation is entirely free of this 
peculiar difficulty. The error oscillations do not have the tendency 
to increase toward the end points of the range but continue to 
maintain the same order of magnitude throughout the range. The 
trigonometric kind of interpolation is thus both analytically and 
practically vastly superior to the ordinary polynomial interpolation 
for data which are given equidistantly. 


12. Interpolation by sine functions. Let f(x) be an odd function. 
The formula (11.31) comes into operation, with m = n — 1. 


h(x) = b, sin x + b, sin 2x + = + ba sin (n — 1)x (4-12.1) 
with 


n—1 
ba > y, sin k, — (4-12.2) 
no n 
The term b, sin nx cannot be added, since the function sin nz has 
zeros at all the data points, which leaves b,, undetermined. Actually 
the number of data points is in effect not more than n — 1, since 
h(0) = 0 because of the odd character of the function, while the value 
h(a) cannot come into evidence in view of the fact that all functions 
sin kx vanish at the point x = ~. The expansion (11.31) will have 


1 Cf. the discussion of the Runge phenomenon in V, 15. 


236 Harmonic Analysis Chap. IV 


very slow convergence if h(x) does not satisfy the boundary condition 

h(a) = 0 (4-12.3) 

It is thus advisable to apply the sine analysis to a given function only 

if that function vanishes at the points x = 0 and x= ~. If these 

conditions are not satisfied, we can subtract a linear trend from the 
given function f(x) by putting 

h(x) = f(x) — (x + Bx) (4-12.4) 

We determine « and f from the conditions A(0) = A(z) = 0, 

obtaining 
a = f(0) (4-12.5) 


T 
The sine analysis of A(x) has now satisfactory convergence. 

If it is unavoidable that the sine analysis of the original f(x) shall 
be found, it is still advisable to proceed in the given manner and then 
add the theoretically known sine expansion of the function « + fz. 

Computationally the coefficients b, of a sine analysis can be found 
as follows. We construct a matrix with the elements 


b.s = sin «B(z/n) (4-12.6) 
We write the given data y, in a row. 


(0), Yi» Y2 > Yn—1 (0) 
and multiply this row with the successive rows of the b,, matrix. 
The coefficients thus obtained are then multiplied by the constant 
2/n. This gives the successive amplitudes 
By, bas s Da-i 
of the sine analysis (1). 

The construction of the matrix (6) is simplified if we first set it up 
in coded form. We use the code numbers 1, 2, = , n — 1, and start 
out with the “guiding line” 

0, 1,2, = ‚n — 1, —0, —1, —2, = ,(n—1),0  (4-12.7) 
We imagine that these elements are written along the periphery of a 
circle, the last zero coinciding with the first zero. We have thus a 
complete cycle which has no beginning and no end. 

We now pick out every first, every second, every third, --- element of 
the guiding line and write them as the successive rows of a matrix, 
omitting the element 0 at the beginning and at the end. Every row 


§ 13 Interpolation by Cosine Functions 237 


has thus n — 1 elements. For example, the case n = 6 leads to the 
following construction. 
Guiding line: 
0, 1, 2, 3, 4, 5, —0, —1, —2, —3, —4, —5, 0 
Coded matrix: 
Yı Yo Ys Ya Y5 


123 4 5 
2 4-0-2-4 

B= 3 0-3 0 3 (4-12.8) 
42 0 4-2 
a e d 


The significance of these code numbers is as follows: We replace the 
code number k by sin (kr/n). Hence in our case (n = 6) the code 
numbers have to be replaced by the following actual numbers: 


0= 0 
l = sin 30° = 0.5 
2 = sin 60° = 0.86603 


3 = sin 90° = 1 
4 = sin 120° = 0.86603 
= sin 150° = 0.5 


The multiplication matrix is now constructed. If the data row 
Yı , Y5 is multiplied by the successive rows of the matrix B, we 
obtain 5 quantities bi, =, bs. These b; are now multiplied by 
2/n = 1/3, thus obtaining the final Fourier coefficients 


2i 
b= >b; (4-12.9) 


13. Interpolation by cosine functions. Now let f(x) be an even 
function. Then formula (11.29) comes into operation, with the 
following slight modification: the expansion now extends up to 
m = n, but the last term receives the factor 4. 

g(x) = łap + a, cos x + = + a,_, cos (n — 1)x + 3a, cos nx 

(4-13.1) 
with 


2 = , TT 
a, =< > Ya COS ka = (4-13.2) 


a=0 


238 Harmonic Analysis Chap. IV 


The multiplication matrix is now composed of the elements 
WT 
Ayg = COS af = (4-13.3) 


The coded matrix is identical with the previous matrix (12.8) with 
the only difference that the columns 0 and cannot be omitted. The 
zero column—which belongs to the ordinate yọ = g(0)—is composed 
of all zeros, while the column n—which belongs to the ordinate 
Y „ = 2(7)—is composed of the elements 0, —0, 0, —0, ---. Moreover, 
there is a zero row, composed of all zeros, corresponding to the 
coefficient 4a), which did not exist in the case of the sine analysis; 
similarly there is an nth row, composed of alternate zeros, 0, —0, 0, 
‘+, corresponding to the last coefficient 4a,. The complete multiplica- 
tion matrix A thus becomes (for the example n = 6) 


BYo YU Ye Ys Ya Ys We 


0 

1 2 3 4 5 —0. 
2 4 —0 —2 —4 0 
3 —0 -—3 0 3 —0 
4 —2 0 4 —2 0 
5 —4 3 =2 1 —0 
0 0 —0 0 —0 0 


EN 

a 

I 
S606 6.06 


The code numbers have now the following significance: the code 
number k has to be replaced by cos (kz/n). Hence 


0= cos 0°=1 
1 = cos 30° = 0.86603 
2 = cos 60° = 0.5 
= cos 90° = 0 
4 = cos 120° = 0.5 
5 = cos 150° = 0.86603 


After obtaining the coefficients a, we multiply by 2/n in order to 
obtain the final coefficients a,. 


2 
a, = — a, 
n k 


§ 13 Interpolation by Cosine Functions 239 


In the chosen example (n = 6) all the a, are divided by 3. 

The weight factor 4. The cosine analysis shows a peculiarity which 
was not encountered in the sine analysis; this is the weight factor 
4 at the two ends. We must remember that the two limiting ordinates 
y, and y, enter all calculations with the weight 4. Moreover, the two 
limiting terms of the cosine expansion are likewise multiplied by the 
factor 4. These irregularities do not occur in the sine expansion since 
there all these terms are zero. 

Symmetry properties of the multiplication matrices A and B. Both 
matrices A and B, defined by (13.3) and (12.6), are symmetric, 
because of the interchangeability of « and 8. They have, however, 
some valuable additional symmetry properties, caused by the 
symmetry properties of the sine and cosine functions in the four 
quadrants of the circle. Wecan benefit from these symmetry properties 
in reducing the number of multiplications by the factor 2 and even the 
factor 4. We notice from the interpretation of the code numbers that 
the elements kand n — kareinterrelated. The number of independent 
elements is only n/2, instead of n. Moreover, the columns k and 
n — k, and likewise the rows k and n — k are interrelated, differing 
from each other only in sign. If we consider all the even columns 
separately and all the odd columns separately, and do the same with 
the even and odd rows, the entire matrix can be reduced to one- 
fourth of its previous size. We break the matrix into four smaller 
matrices of n/2 rows and columns, operating with the sums and 
differences of the ordinates y, and y,_,. By this procedure we gain 
by the factor 4 in the number of multiplications but we lose by the 
necessity of writing down a larger number of partial results. We 
separate the coefficients of even and odd order, obtaining each 
coefficient as the sum or difference of two partial results. 

Numerical checks. Numerical checks of the computations are 
provided by the fact that the Fourier synthesis of the obtained 
coefficients has to restore the original data. The orthogonality of 
the matrices A and B has the consequence that while the product of 
the y; with the matrices A and B gives the Fourier coefficients a, and 
b, (except for the factor 2/n), the product of the Fourier coefficients a, 
with the matrix A, or the coefficients 5, with the matrix B, restores the 
original ordinates y,. 


1 For further numerical details and practical examples cf. [10]. 


240 Harmonic Analysis Chap. IV 


14. Harmonic analysis of equidistant data. Let a periodic function 
of the period 2/ be given in the range [—/,+-/] by observing it at the 
2n + 1 equidistant points 


Ly = n (a = —n, * , +n) (4-14.1) 


This function can be analyzed by laying a trigonometric polynomial 
of lowest order through the given ordinates. The functions of this 
expansion are 


cos k 7 x, sink 7 x (4-14.2) 


They take the place of the previous functions cos kx, sin kx, which 
are adjusted to the range [—7,+-7]. The complete analysis of the 
a, and b, coefficients occurs exactly according to the previous 
algorithm, using the ordinates 3[f(x,) + /(—x,)] for the cosine 
analysis, and the ordinates 3[/(x,) — /(—2,)] for the sine analysis. 

In many problems of applied analysis the basic function is not 
periodic in itself but defined in a given finite interval of x, let us say 
between x = 0 and x= /. The method of trigonometric expansion 
may be employed as a tool of interpolating the given data by giving 
an analytical expression for the entire function f(x), observed or 
tabulated in a discrete set of equidistant points only. Such an 
analytical expression may be of importance for evaluating f(x) at 
points which lie between the data points. But even more frequently 
the value of such an analytical expression may lie in its operational 
advantages. Many operations of advanced analysis can be per- 
formed with exponential functions relatively easily, while the same 
operation with a given algebraic or transcendental function can only 
formally be indicated; the actual numerical answer would require a 
prohibitive amount of work. In such cases it is of inestimable value 
if the given function is first replaced parexically by a trigonometric 
expansion which gives a close approximation of that function. For 
this purpose the method of trigonometric interpolation can frequently 
be employed. 

In such problems it is of decisive importance that the approximat- 
ing trigonometric series shall have good convergence. This requires 
that function and first derivative shall return to the same values at 


§ 15 The Error of Trigonometric Interpolation 241 


the beginning and the end of the chosen period. Since this condi- 
tion is usually not satisfied at the two end points of the given range, 
we cannot choose the given interval as the full period of a harmonic 
analysis. If we choose the interval [0,/] as the half period of a harmonic 
analysis, defining f(x) in the negative half as an even function 
f(—*) = f(x), we avoid the discontinuity of the function at the two 
end points +/ of the range, since now f(—/) = f(/). However, 
f'(—D = —f'(, and thus we have generally a discontinuity in the 
first derivative. We obtain better convergence by subtracting a linear 
function « + px from the given f(x), thus operating with a function 


h(x) = f(z) — (@ + px) (4-14.3) 


which vanishes at the two end points x = 0 and x = / [cf. (12.5), 
replacing a by l]. If we now reflect this h(x) as an odd function 
h(—x) = —h(x) and consider 2/ as the full period of the harmonic 
analysis, we have continuity of function and first derivative at the two 
end points of the period, since 


h(—D=hI)=0, h(-l)=h'() (4-14.4) 


The expansion of h(x) into a pure sine series 
n—1 
h(x) = > b, sink — 4-14, 
(2) 2 ,sink >a (4-14.5) 


where the b, are obtained according to § 12, will now give satis- 
factory convergence, since the coefficients b, decrease with the third 
power of k (cf. § 5). 

The importance of the method of trigonometric interpolation of 
equidistant data can hardly be overestimated. It makes the mighty 
tool of Fourier analysis accessible to difficult functions whose 
Fourier coefficients, based on the original definition as definite 
integrals [cf. (2.2)] are not calculable. The method of trigonometric 
interpolation replaces these integrals by a simple summation process, 
carried out over a relatively small number of equidistantly spaced 
ordinates; (cf. also, VI, 18). 


15. The error of trigonometric interpolation. It is not always of 
advantage to use the classical Fourier coefficients as a unique frame 
of reference in discussing the nature of trigonometric expansions. 
It is true that the truncated Fourier series (3.1) is at every value of m 


242 Harmonic Analysis Chap. IV 


a best approximation in the sense of minimizing the average error by 
the method of least squares. However, the average error is not always 
the best gauge of the error of a certain approximation. Moreover, if 
the coefficients of this best approximation are not accessible because 
of the excessive labor involved in their evaluation, we pay a relatively 
small price if we replace the classical series by a modified series which 
is easily calculated, and whose error is nevertheless not essentially 
worse than the previous error. It will thus be of interest to compare 
the accuracy of the trigonometric series obtained by interpolation 
with the accuracy of the truncated Fourier series of the same number 
of terms. 

Let us first assume that the given f(x) does not contain more 
harmonic components than n + 1 cosine terms and n — 1 sine terms. 
Then the given 2n + 1 equidistant ordinates—their actual number is 
2n because of the boundary condition /(x_,) = f(x,)—determines 
f(x) exactly, and the series obtained by trigonometric interpolation 
coincides with the truncated Fourier series of 2n terms, both of these 
series giving f(x) without any error. 

Let us now assume that the given f(x) contains further harmonic 
components, the frequency spectrum going up to 2n instead of n. 
Hence f(x) contains terms proportional to 


sin (n + k)z, cos (n + k)x (A=1,2,--,n) (415.1) 


Now the Fourier coefficients of the truncated Fourier series take no 
account of the presence of these higher harmonics. They strictly 
separate the contribution of the various frequencies, thus giving the 
exact harmonic analysis of the given function. 

The series obtained by trigonometric interpolation behaves 
differently. It is sensitive to the presence of these higher harmonics 
by converting them into lower frequencies. Theerror of the interpola- 
tion process must be of the form 

A(X) = (sin nx)u(x) (4-15.2) 
since the error is zero at the data points, i.e., at the zeros of sin nx. 
Now the trigonometric identities 

sin (n + k)x + sin (n — k)x = 2 cos kx sin nx (4-15,3) 
cos (n + k)z — cos (n — k)a = —2 sin kz sin nx l 


show that the frequency n + k is equivalent to the frequency n — k 


§ 15 The Error of Trigonometric Interpolation 243 


in all the data points. The trigonometric interpolation thus converts 
the frequency n + k into the frequency n — k and records an ampli- 
tude of this frequency with full strength and no phase shift in the 
cosine case, with full strength and phase shift of 180° in the sine case. 

The same conversion phenomenon recurs between the frequencies 
2n and 3n, since the frequency 2n + k is equivalent to a frequency 
2n — k = n + (n — k) and thus to the frequency n — (n — k) =k 
and so on. 

This “contamination” from the part of the higher frequencies 
shows that the classical Fourier coefficients are superior to the 
coefficients obtained by interpolation, if we are interested in the true 
harmonic spectrum of the given function. If, however, our aim is 
merely to approximate the given f(x) by a harmonic series, we cannot 
say a priori that the error of the truncated series will necessarily be 
smaller than the error of the interpolated series. Both errors are of 
the same order of magnitude, since the error function 7,,(x) of the 
interpolated series can be conceived as a spectrally garbled version 
of the error of the truncated series. The estimated maximum error 
is the same in both cases. The actual maximum error may be in 
favor of the one or the other series. 

Comparison of the Fourier series with the series obtained by 
trigonometric interpolation deserves more than passing attention. 
Derivation of the Fourier series by elementary means seems to 
indicate that the coefficients a,, b, of the infinite trigonometric series 
representing f(x) cannot be anything else but the customary definite 
integrals (2.2). And thus the erroneous notion prevails that only the 
Fourier series gives “ʻexact” results, while other series whose 
coefficients have been obtained by different means, are only “approxi- 
mate” in nature. The mistake can be traced to the historical fact that 
a certain type of infinite limit process received overwhelming emphasis 
in the evolution of mathematics and thus acquired a feature of 
uniqueness which leads to wrong interpretations. 

If we write down an infinite ortho-normal series of the form 


SE) = aE) + apala) + (4-15.4) 

and then multiplying by p,(x) and integrating term by term obtain 
b 

Cc, = | fp) dx (4-15.5) 


the process seems unique. It is important, however, to keep certain 


244 Harmonic Analysis Chap. IV 


tacit assumptions in mind which were not clear in Fourier’s time 
which, however, became completely clarified when the exact limit 
concept emerged during the nineteenth century. Two remarks are of 
particular interest. 

In the first place we have to realize that the “equal” sign in 
equation (4) is not used in the legitimate sense and is meaningful 
only if we know its significance. Lagrange objected to the thesis 
that a nonanalytical function can be represented by an infinite 
Superposition of sine and cosine functions, arguing that these 
functions are analytical functions, and any number of them remains 
still analytical. The proper answer to Lagrange’s objection is that 
the equal sign in an infinite expansion of the type (4) does not mean 
real equality. It means only that the right side, as we add more and 
more terms, comes nearer and nearer to the left side, and by adding 
up enough terms we can make the error in absolute value as small as 
we wish, but not zero. Hence the Fourier series, like all infinite 
expansions, does not give exact results, but only arbitrarily close 
results. It represents a never ending approximation process. 

The second remark is concerned with the special type of infinite 
approximation which is achieved by an equation of the form (4), 
that is, an equation of the “‘dot-dot-dot type.” We add up more and 
more terms which means in exact analysis that the step from n to 
n + 1 in this process occurs as follows. We add one more term, 
without changing anything in the previous terms. This, however, is by 
no means necessary. We could add one more term and at the same 
time change the coefficients of the previous terms. If we do so, we 
lose in simplicity, since now we have to give n + 1 new coefficients 
instead of one new coefficient. A one-dimensional sequence of terms 
changes to a triangular matrix. But this sacrifice in simplicity may 
lead to simplicity in another direction. For example, the usually 
inaccessible definite integrals which are needed for the evaluation of 
the Fourier coefficients may be replaced by the easily calculated sums 
which are demanded in the process of trigonometric interpolation. 
Moreover, the validity of the trigonometric expansion may be 
extended to a class of functions which go far beyond the class of 
absolutely integrable functions; (cf. § 8). These series differ from 
the Fourier series in the manner of approximation but not in the degree 
of accuracy which is of the “not absolute but arbitrarily close” type 
in all cases in which the series converges at all (cf. also VII, 5). 


§ 16 Interpolation by Chebyshev Polynomials 245 


16. Interpolation by Chebyshev polynomials. The disadvantage 
of trigonometric interpolation of equidistant data is usually the fact 
that f(x) is basically not a periodic function of x, and is made periodic 
only by definition. The best we can hope for is continuity of function 
and first derivative at the beginning and end of the period. The 
discontinuities in the higher derivatives make the resulting series 
relatively slowly convergent. This slow convergence is avoided by a 
modified form of the Fourier series which is frequently of eminent 
importance. Let the range of f(x) be normalized to [—~1,+-1]. Let 
f(x) be analytic inside this range and.on the boundary, but without any 
further boundary conditions. We now make the transformation from 
x to a new variable 0, by the definition 


x = cos 6 (4-16.1) 


The given f(x) now becomes f(cos 6) = (6) and is thus transformed 
into a genuinely periodic function of the new variable 0. 

If x changes between —1 and +1, the angle variable 0 changes 
between 0 and ~. Since, however, a change of 6 to —@ leaves x 
unchanged, we possess the function ¢(9) at any value of 0. It is an 
even function of 6. 

$(—6) = $0) (4-16.2) 
Moreover, if f(x) can be differentiated any number of times with 
respect to x, the transformed function (0) can be differentiated any 
number of times with respect to 0, at any point of the range, inclusive 
of the boundaries. 

Under these circumstances the Fourier expansion of $(@) will have 
much faster convergence than if the original f(x) had been expanded 
into a Fourier series. This Fourier expansion has only cosine terms 
since ġ(0) is an even function of 0. 


ji œ 
$O) = 5 vo + 2,7 cos kô (4-16.3) 


Let us interpret this series in terms of the original variable zx. 

The functions cos k0, if written in the variable x, become poly- 
nomials of x, called the Chebyshev polynomials T„(x); (cf. V, 20). 
They are alternatingly even and odd functions of x which can be 
generated on the basis of the recurrence relation 


246 Harmonic Analysis Chap. IV 
starting with T,(z) = 1, T(z) = <x 


The higher polynomials are tabulated in Table V. 
The expansion (3), if written in the original variable x, becomes 


fæ) = > VT (2) (4-16.5) 
k=0 


The coefficients of this expansion are determined by the customary 
definite integrals. 


2 m 
a f p(0) cos kô dð (4-16.6) 


Translated into the variable x we obtain 


a p T. oe 4-16.7 
a= fOr = (4-16.7 


1 — z? 


Once more we can raise the objection that evaluation of these 
definite integrals will be frequently beyond our capacities. Hence we 
will change over to the method of trigonometric interpolation, 
described before in § 13. For this purpose the basic functional 
values y, have to be given at the points 


6, = a~ (a= 0,1,2, , n) (4-16.8) 


This means in the variable x that the data points are placed according 
to the law 


£, = COS & = (4-16.9) 


This is a strongly nonuniform distribution of the points of interpola- 
tion, which crowds the points near to the two end points x = +1 of 
the range. The geometrical interpretation of the law (16.9) is that 
the semicircle of the radius 1 is divided into n equal parts and the 
points projected down on the x axis. The equidistant distribution in 
the angle variable 0 causes a strongly nonequidistant distribution of 
the points in the projection. 


§ 16 Interpolation by Chebyshev Polynomials 247 


The functional values 
Ya = f (8a) = f(cos a =) (4-16.10) 


have to be given in these n + 1 generally irrational points. Apart 
from this inconvenience, the interpolation itself is an easily performed 


routine process, since we evaluate the coefficients with the help of the 
matrix (13.3), discussed before in § 13. We thus obtain the expansion 


FE = fay + T(z) H e +4a,7,(2) (4-16.11) 


which can now be rearranged into an ordinary power series. The 
convergence of this series is far superior to the convergence of the 
ordinary Taylor series. In fact, the Taylor expansion may diverge 
completely, while the convergence of the expansion (11) is guaranteed 
by the fact that any continuous function of bounded variation can be 
expanded into a uniformly convergent Fourier series. 

The nonequidistant distribution of the data points has a highly 
beneficial effect on the convergence of the resulting interpolation. 
While equidistant polynomial interpolation gives error oscillations 
which are strongly increased around the two ends of the range, 
proper crowding of the data points toward the two end points 
prevents the oscillations from becoming damaging. The error now 
oscillates with the same order of magnitude throughout the range. 
The resulting power series is thus distinguished by the fact that it 
approximates the given function with an absolutely smaller maximum 
error than approximations obtained by other polynomials. More- 
over, the process of interpolation avoids evaluation of definite 
integrals, and is numerically simple and straightforward. Thus the 


248 Harmonic Analysis Chap. IV 


process discussed here has many practical applications. It translates 
the outstanding analytical properties of the trigonometric type of 
interpolation into the realm of powers. 


17. The Fourier integral. While the series named after Fourier 
was already well established at the time of Fourier—based on the 
pioneering work of J. Bernoulli, Euler, and Lagrange—the Fourier 
integral is the undisputed discovery of Fourier. It has become one of 
the most powerful tools of mathematical analysis, and is particularly 
fundamental in all problems pertaining to the input-output relation 
of electric networks. Fourier found that decomposition of arbitrary 
functions into harmonic components remains possible even if the 
realm of the function f(x) extends on both sides to infinity. In this 
case the fundamental frequency converges to zero, and thus the 
process of summation changes into one of integration. Moreover, 
the limits of the integrals which define the Fourier coefficients are no 
longer +7 but +00. 

The Fourier series analyzes a function of a definite finite range in 
terms of sine and cosine functions of given frequencies. If we extend 
the given function beyond its range of definition, one of two things 
can happen. The function may be a truly periodic function, or may 
exist in a finite interval [—/,+/] only and we may force periodicity on 
J («) in order to make the Fourier series applicable for its representa- 
tion. In the latter case the periodicity of the function is not given by 
nature, but is employed as a mathematical artifice only. If we use a 
physical instrument such as a wave analyzer for determination of the 
harmonic components of the function, we will in the first case actually 
obtain the coefficients of the Fourier series as measurable physical 
quantities. In the second case the situation is quite different. The 
wave analyzer does not recognize the given finite range of f(x). The 
variable x represents now the time ¢, and the wave analyzer takes 
account of the given function not only during the finite time interval 
2/ but during all times. And thus the question arises, What happens 
to the harmonic analysis if the fundamental period of the analysis is 
not specified to a definite quantity but can become arbitrarily large? 

We start with the function y = f (x), defined in the range between 
—l and +/, satisfying the Dirichlet conditions. This range can be 
transformed to the normal range +r by the scale transformation 
€ = ma/l. Going back to the original variable x, we can write the 


§ 17 The Fourier Integral 249 


Fourier series, now formulated for arbitrary limits +/. We will use 
the mathematically most convenient complex form of the Fourier 
series (cf. 2.4). 


+o 
f(2) = > cette (4-17.1) 
with p 
1 +? , 
=z f fae" de (4-17.2) 


The customary ‘“‘real’’ form of the series arises if we put 
Cy = $(a, — ib), C_y = (a, + iby) (4-17.3) 


and combine the terms of the subscripts k and —k. 

Although the function f(x) was originally given in the realm -+-/ 
only, there is no reason why we should not enlarge the range to a 
more extended interval +Z. Outside the original range we will 
define f(x) as zero. 


y=f(x) 


-L =| O +1 +L 


The new function y = f(x), defined in the enlarged range, can 
again be expanded into a Fourier series. The formulas (1) and (2) 
hold again, replacing / by L. 


+ 0 


f@) = > cetera (4-17.4) 


k=- 0 


= l IN —ikrz|L 
=a |, f @e dx (4-17.5) 


250 Harmonic Analysis Chap. IV 


Although the new series (4) analyzes the function f(x) in entirely new 
frequencies and entirely new coefficients, it approaches nevertheless 
the same f(x) for any x value between +-/, while outside of that range 
(up to the bounds +L), the new series approaches the value zero. 

Let us analyze the harmonic contents of the function f(z). If a 
sine or a cosine function is written in the form sin 277t, cos 2771, 
we call y—the number of vibrations per second— the “frequency” of 
the harmonic vibrations. The harmonic functions of the first series 
(1) can be written in the form 


27x 


oes 27x 
cos k J + isin k —- 


Hence the frequencies present in our first analysis are 


1 2 k 
15 57? ’2= 57? wy Vk z’ 
The corresponding frequencies present in our second analysis become 
1 2 k 
1 op? "257? sie "ea? 


If L is twice as large as /, the fundamental frequency becomes one-half 
of the previous fundamental frequency. Hence the harmonics 
1,2, 3, 4 of the previous analysis become now the harmonics 2, 4, 6,8, +, 
and our complete analysis contains twice as many frequencies as 
before. 

In order to study the distribution of the harmonic components,we 
will plot them in a graph. As abscissa we choose the frequency to 
which they belong. Hence the chart of a harmonic analysis looks as 
follows, separating real and imaginary parts of c, 


bk 


§ 17 The Fourier Integral 251 


If the same analysis is performed in the second case, we get more 
lines, since more frequencies are present. We will agree to omit the 
constant factor 1/2L in (5), and merely plot the quantities 


+1 
n=] J (oje mek (4-17.6) 
Then 
gal (4-17.7) 


mys 
The amplitudes y, have the advantage that we need not change them 
as we make L larger and larger; we merely have to fill in newer and 
newer lines. In fact we notice that our graph merely picks out 
certain definite ordinates of a universal function. Let us define the 
following function of the continuous variable ». 


F(v) = i f(e? dx (4-17.8) 


This function contains all possible harmonic components of f(x), 
no matter how small or large L may be. 


y, =F (==) (4-17.9) 


F(y) -— F(-y) i 
2 


Fiv) +F(-v) 
2 


It is a remarkable fact that, no matter how irregular f(x) was, the new 
function F(») is always a continuous and even analytical function 
(i.e., differentiable to any degree) of v. The original f(x) may have 
any number of discontinuities or other irregularities, but the new 


252 Harmonic Analysis Chap. IV 


F(v) is nevertheless analytical for all (real or complex) values of v. 
It is called the “Fourier transform” of the original function f(z). It 
does not resemble the original function at all; it is merely associated 
with it, somewhat as the logarithm of a number is associated with the 
original number. 

Let us now see what happens if we go with L to larger and larger 
values, increasing the period of our analysis gradually to infinity. 
Then the lines we have to fill in get constantly denser and denser, and 
in the limit, as L grows to infinity, there is no longer discrimination 
in favor of any specific frequencies, but all frequencies are equally 
represented. The previous line spectrum changes in the limit to a 
continuous spectrum. The function gives now in its totality the 
distribution of the harmonic amplitudes of f(x). 

What can we say about the Fourier synthesis, expressed in the 
form of the Fourier series? This series is in itself in very close rela- 
tion to the transform function F(v). Equations (4) and (5) can be 
combined in terms of the Fourier transform F(v). 


—— k \ Qwikx/2L 
fæ) = T 2 (=). (4-17.10) 


—@ 


We define the following function of the frequency v: 


G(v) = F(v)e?””" (4-17.11) 
Then 
1X k 
f@o=— >>. ($) (4-17.12) 
or, putting 
1 ; 
ar =a (4-17.13) 
we can write 
+ 0 
f(a) =e > G(ke) (4-17.14) 
k=— o 


(The figure assumes that x is a given constant. The fact that the 
values of G(v) are complex is discarded for the purpose of the 
illustration.) The larger the fundamental interval 2L, the denser 


§ 17 The Fourier Integral 253 


become the ordinates of G(v) which participate in the Fourier sum. 
Let us now increase L to infinity. This means that e tends to zero. 
By the fundamental theorem of integral calculus the sum (14) 


-2€ -€ 0 € 2e 3e 4e 5e 


approaches a definite limit as € decreases to zero. This limit is the 
total area under the function y = G(?). 


+ 0 + 0 aus 
f(x) = f 7 G(v) dv = f . F(vje™? dv (4-17.15) 


The Fourier sum approaches more and more the Fourier integral. 
While the Fourier transform F(v) resolves the given f(æ) into its 
harmonic components, the Fourier integral synthesizes these harmonic 
components to the original function. Observe the remarkable 
reciprocity of the two equations: 


+1 l 
F(v) = iN fae?" dx 


+o 
f) = Í Five?" dy 
The only discrepancy is that in the first integral the limits are +/, in 
the second +00. But this happened only because we have assumed 
that f(x) was given only between +/ and was zero outside of that 
interval. We can stretch the realm of defining f(x) to an arbitrarily 


254 Harmonic Analysis Chap. IV 


large interval, and we can gradually approach infinity without 
invalidating our results. In the limit we obtain 


Fo) = | T Fajen Bri de (4-17.16) 


{@= Í F (ve?™”®?” dv (4-17.17) 


The reciprocity is now complete. The only thing we lose by the 
infinite limits of the first equation is that F(v) is no longer an analytical 
or even necessarily continuous function of v. Nor is it permissible to 
substitute anything but real values for v. But F(v) still approaches a 
definite limit, provided that f(x) is absolutely integrable, which 
means the existence of 


[ve dx (4-17.18) 


and that f(x) is of bounded variation in every finite interval. 

In order to display the characteristic similarities and dissimilarities 
of the Fourier series and the Fourier integral, we display once more 
the fundamental formulas in juxtaposition: 

Fourier series; discrete spectrum (frequencies 1/2/, 2/21, + ,k/2/): 


+o 


+l 
fæ = > cpe? "KEIN, C= 1 f f (2ed dy 
pa o 21 J-i 
Fourier integral; continuous spectrum (all frequencies between — o0 
and +00): 


+ 0 + œ 
{a= f Foe dy, F(v) = f f(xe~2* dx 
—o — 0 

Question: What is a negative frequency ? 

Answer: If we operate with positive frequencies oniy, every 
frequency is associated with two functions, viz., sin 27v~ and 
cos 27vx. This duplicity can be avoided by introducing positive and 
negative frequencies and associating with v the single function e*””*. 
An arbitrary real vibration is then always an interaction of the two 
frequencies +v and —». 


§18 The Input-output Relation of Electric Networks 255 


In practical applications of the Fourier transform it is frequently of 
greater advantage to introduce the “‘angular frequency” w = 27 
instead of the ordinary frequency v, and to write the formulas of 
harmonic analysis and synthesis in the following form.! 


+o l 
F(w) = o f(ae** dx (4-17.19) 


f@m= = [r (w)? dw 


18. The input-output relation of electric networks. The Fourier 
transform is one of the most important tools of applied analysis. 
It plays a fundamental role in all electric network problems, but its 
applicability reaches over to a much wider field because the condi- 
tions which prevail in electric networks occur equally in many other 
problems of physics and engineering. 

The basic situation can be described as follows: We have a 
measuring device of the galvanometer type which records a certain 
given function of the time ¢. We have thus two functions, viz., the 
“input function” or “signal,” f(t), and the “output function” or 
“response,” g(t). The measuring device may take the form of a 
telephone or loud-speaker or other communication device. It may 
equally take the form of a servo-mechanism which responds to a 
given “command.” Or it may take the form of a boundary value 
problem in which the right side of the given partial differential 
equation is the input, and the solution is the output. In all these 
cases the general pattern of action can be characterized by a number 
of features which are common to the entire group of problems. 

We have two functions f (t ) and g(t), the input and the output. The 
given physical mechanism determines the relation between these two 
functions. We can conceive g(t) as a certain “mapping” if the func- 
tion f(t) and we will say that this mapping is “of the C type” (C 


1 For historical reasons almost the entire mathematical literature discusses the 
theory of the Fourier integral on the basis of the so-called “Fourier double 
integral” which combines harmonic analysis and synthesis into a single operation, 
thus obscuring the issue to such an extent that the theory of the Fourier integral— 
in spite of its fundamental importance in extended fields of physics and engineer- 
ing—becomes frequently one of the most elusive and least understood chapters 
of advanced analysis. 


256 Harmonic Analysis Chap. IV 


referring to communication) if the following general conditions are 
realized: 

1. The mapping is linear. This means that if f(t) is changed into 
af(t), also g(t) is changed into ag(t). Moreover, if f(t) is a linear 
superposition of any number of functions: 


SO = MAO) + ae felt) + + nF alt) (4-18.1) 
then g(t) is the same superposition of the corresponding mapped 
functions. 

BCE) = ay R(t) + agat) + + Angat) (4-18.2) 

2. If f(t) is a periodic function, written in the complex form: 
S(t) =e! (4-18.3) 
then g(t) reproduces this function with a mere factor of propor- 
tionality. 
g(t) = Koe (4-18.4) 
The factor $(w) is generally complex and can thus be split into a real 
and imaginary part. 
dw) = Alw) + B(w)i (4-18.5) 
The factor $(w) is called the “transfer function.” The physical signi- 
ficance of this function is that if the input is a strictly periodic 
function of definite frequency and constant amplitude, the output is 
likewise a periodic function of the same frequency but modified 
amplitude and modified phase. The modification of amplitude and 
phase is a function of the frequency and is characterized by ¢(@). 
The response g(t) to a strictly sinusoidal input function is called the 
“steady-state response” or “‘frequency response.” 
Now the Fourier integral resolves an arbitrary f(t) into its har- 
monic components (cf. 17.19). 


f= — | F (wet dw (4-18.6) 


In view of the superposition principle, the communication device 
takes each one of these harmonic components and applies to it its 
own transfer function. Then the harmonic components are synthe- 
sized again. The result of this operation is 


+o 
g(t) = = f > Fwd)” dw (4-18.7) 


§ 18 The Input-output Relation of Electric Networks 257 


Hence we see that we can find to any given f(t) the corresponding 
g(t) if the transfer function ¢(@) is given. 

However, the relation between f(t) and g(t) can still be expressed 
in a totally different form, introducing a second fundamental 
function by which the action of the network may be characterized. 
We have seen (cf. 17.19) that by the Fourier’s reciprocity theorem 
the relation (18.6) is reversible. 


F(w) = [r yet dt (4-18.8) 


We can introduce this F(œw) in the equation (7) and write the result 
as follows.! 


+o 
aS o f(a)K(t — 7) dr (4-18.9) 
where 


K() = = im p(o)? deo (4-18.10) 


Now we know in advance from the nature of a “response” that a 
response cannot anticipate but must follow the signal. The future can 
have no influence on the past. Hence g(t) cannot depend on values 
of f(t) which lie beyond the time moment t = 7. For this reason we 
must demand for all “stable” mappings of the C type that the 
following additional condition shall hold. 

K(t — 7r) =0 for7t>t 
which means 


K(é) = > [see dw =0 (€<0) (4-18.11) 

In this case the upper limit of the integral (9) becomes ¢ instead of 
infinity. 

g(t) = f f(T)K(t — 7) dr (4-18.12) 

A physical signal f(t) usually does not start at t = —oo but ata 


definite time moment which may be normalized to tł = 0. In this 
case the output g(t) becomes 


t 
g(t) = | FOKU — 7) dr (4-18.13) 


1 This move is not self-evident since a double limit process is involved. But the 
justification can be given for all functions which are of bounded variation. 


258 Harmonic Analysis Chap. IV 


In the new representation the fundamental quantity which 
characterizes the input-output relation of the given mechanism is 
the function K(é). This function has a very definite physical signi- 
ficance. Let us employ a signal f(t) which shall last only during the 
small time interval between t = 0 and t = e. During this time inter- 
val f(t) shall jump from 0 to the large constant value 1/e and then 
fall back again to the value 0. We now make e arbitrarily small. A 
signal of this type (comparable to a hammer blow) is called a “unit 
pulse”; the corresponding response is called the “pulse response.” 

Since now f (t ) lasts only for an infinitesimal time, during which the 
function K(t — 7) is practically a constant, we can write equation (13) 
in the form 


g(t) = K(t) [fo dr = K(t) (4-18.14) 


This shows that the function K(t) has the significance of the pulse 
response of the mechanism.} 


f(t) 


Mj 


g(t) = K(t) 


O e To t 

According to (10) the pulse response and the steady-state response 
are in a definite relation to each other. One is the Fourier transform 
of the other. If the transfer function ¢(@) is given, we can obtain 
K(t) on the basis of the Fourier integral (10). But then, by Fourier’s 
reciprocity theorem we can also put 


b(w) = f " K(t)e-** dt (4-18.15) 


1 For historical reasons electrical engineers frequently prefer to characterize 
the transient behavior of a network by the “unit step function response.” The 
two responses are in a simple relation to each other inasmuch as the pulse response 
is the time derivative of the step function response. 


§19 Empirical Determination of Input-output Relation 259 


(the lower limit being zero, since K(t) vanishes for negative t). 
Splitting real and imaginary parts we obtain according to (5), 


A(w) = Í KO) cos wt dt 
0 
(4-18.16) 
Bw) = — [ko sin wt dt 
0 


This shows that A(w) is always an even, B(w) always an odd function 
of w. 
A(—w) = A(), B(—w) = —B(o) (4-18.17) 


Going back to equation (10) and writing it in real form we obtain 
1 (e0) (20) 
K(t) = — f A(w) cos wt dw — ll B(w) sin wt do! (4-18.18) 
m LJo 0 


However, the stability condition (11) establishes a further relation 
between A(w) and B(w), viz., that for all positive t we must have 


Í A(w) cos wt dw = — f B(w) sin wt dw (4-18.19) 
0 0 


Hence it suffices to know either the real or the imaginary part of the 
transfer function. 


K(t) = zd f A(w) cos wt dw = — 4 f B(w) sin wt dw (4-18.20) 
mT TT 


19. Empirical determination of the input-output relation. It is 
frequently possible to determine the frequency response of a 
mechanism by direct observations. We bring the mechanism in 
forced vibrations by using a sinusoidal input function and waiting 
until the transient response has died away. If we then measure the 
amplitude of the output and the phase shift between the two func- 
tions, we have obtained ġ(w) for a given frequency w. The measure- 
ments have to be repeated for a sufficiently wide range of frequencies. 

In many cases, application of a pulse as an input is more convenient. 
By observing the output we are in possession of the function K(t) 
whose Fourier transform gives ¢(w). The difficulty is only that the 
physical realization of a very sudden “hammer blow” is frequently 
not possible without doing serious damage to the given mechanism. 


260 Harmonic Analysis Chap. IV 


We have to be satisfied with a more general situation; we use some 
observed function f(t) as input, and obtain some observed function 
g(t) as output. From these data we have to obtain the pulse response 
K(t) or the frequency response ¢(w) by calculation. Although we will 
leave the shape of f (t) arbitrary, we still assume one general feature in 
which it resembles a pulse: it shall start from zero at t = 0 and it 
shall come down to zero again, after a certain finite time fp. 


There is one further feature of the pulse response which was not 
included in our general discussions. All the devices used in com- 
munications and servo-mechanisms have the property that they 
dissipate the energy which was transferred to them by the input pulse. 
Hence K(t) does not extend to infinity but becomes practically zero 
after a certain time Tọ, which we want to call the “memory time” of 
the device.! Since we have assumed that the function f(t) vanishes 
after the time interval f,, we can now add that the output g(t) will 
practically vanish after the time interval 


T,=t+T (4-19.1) 


Let us now imagine that the same f(t) was applied again and again, 
before and after ¢ = 0, in intervals of 7}, i.e., with the starting times 
t = 0, +7), +27, °°. Then f(t) becomes periodic with the period 


1 This memory time is not a sharply defined quantity since K(t) becomes zero 
only asymptotically. But for any given accuracy a certain T, exists beyond which 
K(t) becomes negligible. The greater the demanded accuracy, the greater is the 
memory time To. 


§19 Empirical Determination of Input-output Relation 261 


T, and the same is true of g(t). Yet within the time interval 7} no 
change occurred in either of the functions, since we have waited 
until g(t) became zero. But now we can analyze both f(t) and g(t) in 
terms of harmonic functions by expanding both functions in a 
Fourier series of the base 7;: 


+ +o 


fO= > cell, g(t) = > c,e2ntlT, (4.19.2) 


k=- © k=— œ 


Practically only a finite number of. terms will be present in both 
expansions, and the coefficients c, and ¢, will be determined by 
trigonometric interpolation rather than by integration (cf. § 11 to 
§ 13). Our input function is now a superposition of strictly sinusoidal 
functions, and the same can be said of the output. By the definition 
of the transfer function ¢(v) we obtain 


$ (=) =F (k=0,41,42,)  (4-19.3) 
T Ck 

Although ¢(v) is thus obtained only at a discrete set of equidistant 
points, we can interpolate between by using linear (or quadratic) 
interpolation. 

Even so, the difficulty remains that the given f(t) may have been 
too smooth for our purpose. If f(t) does not contain enough 
harmonic overtones, determination of the ratio (3) may be rendered 
impossible beyond a certain k, because of the smallness of the de- 
nominator. Then we have to stop with a ¢(v) which is still far from 
being negligibly small. However, it suffices that we shall be able to 
establish the asymptotic law of $(v). The speed with which ¢(7) 
decreases to zero is at least »~1. If the last few observable ¢(v) fit the 
pattern 


p(y) = “ (4-19.4) 


or more accurately the pattern 


1a, a 


$(v) = (4-19.5) 


y y2 


we can be satisfied, since for all frequencies beyond the range of our 
measurements the value of ¢(v) can be obtained by extrapolation. 


262 Harmonic Analysis Chap. IV 


If the frequency response was evaluated in this fashion, the pulse 
response too can be found, on the basis of the same data. We use 
the device that we repeat the input pulse in regular intervals T}. 
The Fourier series of this pulse (“delta function”) is the divergent 


series 
+o 


f(t) == > errikt|T, (4-19.6) 


0%4= -—00 


This series, although useless in itself, becomes convergent by applying 
the transfer function to each of the frequencies present. 


kO=2 5 $ (7) e2nikt/T, (4-19.7) 


While this infinite series converges, the convergence may be too 
slow for practical purposes, particularly if the asymptotic pattern of 
(v) is of the type (4). We speed up the convergence by putting 


K(t) = A,e*! + Apte** + K,(t) (4-19.8) 
The transfer function of K(t) becomes 


A A 
E 2 


a+ 2riv (a+ 2riv)}? + hC) RR 


$v) = 


Hence for large », 


Bi Ai A, + aA, 
Ar) = $0) + 5 + One (4-19.10) 
Let us now choose 
A, = —2n7a 
ea (4-19.11) 


A, = —4n*a, = aA, 


Then the asymptotic law (5) is absorbed by the correction terms, and 
¢,(v) decreases to zero with the third power of v~t. The expansion (7) 
of the function K,(t) has now sufficiently quick convergence. The 
exponent « may be chosen as 


a = 2n/Ty (4-19.12) 


because e~?” is sufficiently small to be considered negligible. 


§ 20 Interpolation of the Fourier Transform 263 


20. Interpolation of the Fourier transform. The infinite limits of 
the Fourier transform can often be changed into finite limits, which 
is sometimes of great operational advantage. Even if f(t) extends to 
infinity, we may subtract from it some suitably chosen function 
Jo(t) which imitates the asymptotic behavior of f (t) for large values of 
t. We then replace the original function f(t) by the new function 
Ait) = f(t) — f(t) which is practically zero beyond a certain t = +. 
Thus we obtain 

F(v) = Kv) + Fo) (4-20.1) 


where F,(v) is the Fourier transform of the suitably chosen function 
Jo(t), while 


l 
F,Q0) = y fide?" dt (4-20.2) 


This new function has the remarkable property that it suffices 
to give a certain set of fundamental data as “key values” from which 
all other values of F,(v) are obtainable by interpolation. 

We assume that f(t) is a function of bounded variation defined in 
the range +/. Hence f(t) can be expanded into a convergent 


Fourier series. 
+ 00 


A) = > certs (4-20.3) 
k= — œ 
where 
c, = = f “Apezen dt = 1, (5) (4-20.4) 
E A J aa 21°1\ 21 


If the infinite expansion (3) is introduced in (2) and we integrate term 
by term, we obtain 


sin2avl ~~ _ (k\ (—1)* 
alee aae +209 


This formula is an interpolation (or extrapolation) formula. It gives 
F,(v) for all (real or complex) values of v in the form of a convergent 
infinite expansion, expressed in terms of the basic ordinates 


Yn = F,(k/21) 


If v is of the form paste 


264 Harmonic Analysis Chap. IV 


and e is small, F,(v) will be determined predominantly by y, and its 
immediate neighbors. But if £ is around the halfway between two 
key points, the convergence of the series (5) is very slow. We can 
speed up the convergence by the following device. We define a set 
of ordinates u, by the following recurrences: 


uo = Yo 
ui = Yı — W% Uy = Y1 — o 
Uz = Yo— U Uz = Y2 — U4 (4-20.6) 


Then the sum (5) can be replaced by the faster converging sum 


E sin 27rvl _ _ & < O OD 
RO) = z l 22w41 2 (k — 2lv)(k — 2b + 1) 


< (—1)*u_, 
T 2 (k + Wyk + 2b + (PAS 


Here the weights of the ordinates away from the central ordinate fall 
off quadratically with the distance and the number of terms needed 
for satisfactory convergence becomes reasonably small.? 


1 Cf. [10], p. 441. 


21. Interpolatory filter analysis. We have seen in §18 that the 
frequency response of a communication device is the Fourier 
transform of its pulse response. In view of the fact that the pulse 
response K(t) is not an arbitrary function of ¢ but a function which 
vanishes for all negative values of t, the frequency response is by no 
means a freely choosable function but a function which has to 
satisfy some very definite analytical conditions, expressed in the 
symmetry relations (18.17) and in the integral condition (18.19). Now 
in the construction of electric filters we would like to obtain a ġ(v) of 
prescribed properties, and the question arises to what extent we can 
satisfy certain given filter properties, without violating the analytical 
conditions which are demanded by the general nature of the C type 
kind of mapping. 

The interpolation formula of the previous section is of great value 
in the discussion of this problem. We have seen that the pulse 
response of any stable network dissipates the input energy and thus 


§ 21 Interpolatory Filter Analysis 265 


comes to practically zero after a certain time T, the “memory time” 
of the network. Consequently the function 4(») is of a very definite 
character. Not only is the lower limit of the integration zero, but the 
upper limit is a finite time Tọ. 


d(x) = f K(te72™™ dt (4-21.1) 
If we now transform ¢ to a new variable ¢, by the transformation 
t=t,4+ 47, (4-21.2) 
and put 
| K + $79) = K,(4) (4-21.3) 
we obtain 
P(r) = e7 h0) (4-21.4) 
where 
Tyl2 
h0) = [ r, paate ert dt (4-21.5) 


This expression is of the form (20.2) studied before, the limit / 
being replaced by 7,/2. Hence the interpolation formula (20.5) 
becomes valid again. The key values are 


Yc = $y(k/Tp) (4-21.6) 


They can be prescribed with considerable freedom, except for the 
reality condition 


Yn = Yr (4-21.7) 


and the convergence condition 


>| Y, | = finite (4-21.8) 
k=0 


It is of interest to observe that while not even an arbitrarily small 
continuous portion of ¢(v) can be freely prescribed, yet it is permis- 
sible to prescribe ¢,(v) practically freely in an infinity of equidistant 
points, belonging to the frequencies 


y= 0, Mo = Ty; 2%); 3%; es  (4-21.9) 


266 Harmonic Analysis Chap. IV 


The question remains whether the interpolation of ¢,(v) between 
these points will be sufficiently smooth. Now the function generated 
by one single ordinate y, = 1 is given by 


sin 7(Tov — k) 
"(To — k) 


It is of the character of the Dirichlet kernel (2.8). In view of the 
secondary maxima of this function the concentration to the immediate 
neighborhood of » = kvg is not very strong. The consequence is 
that the interpolation is smooth only if the y, follow a smooth 
pattern. Near a discontinuity, however, the cumbersome Gibbs 
oscillations will come into play (cf. § 9). But we have seen how 
beneficially the large amplitudes of these oscillations can be cut down 
by the method of o smoothing. This smoothing can be applied to our 
present problem. It consists in taking the local arithmetic average of 
d(v) between v + 1/7, and v — 1/7). The result of this operation is 
that the function (10) is replaced by the function 


pr) = (4-21.10) 


P) = "a [ Sim(Tov — k + 1) — Sin(Tw — k — 1)| (4-21.11) 
where 


Si(é) = pi cas (4-21.12) 


This function is called the “‘sine integral”; it is a well-investigated 
and well-tabulated function.1 The new function is satisfactorily 
concentrated to the immediate neighborhood of the central maximum. 

The effect of this smoothing operation, if applied to the equation 
(5), is that K,(t,) is to be replaced by 


sin ee 


K,(t) = Kt) To (4-21.13) 


The function K,(4) itself is expressible in terms of the prescribed 
ordinates y,, according to (20.3). 


K,(t,) = “> yye Te (4-21.14) 
0 t=- o 


1 Tables of the Sine, Cosine and Exponential Integrals, Nat. Bur. Standards, 
Vols. I, II. 


§ 22 Search for Hidden Periodicities 267 


One of the classical problems of filter analysis is the construction 
of an “ideal low-pass filter.” This filter would absorb all frequencies 
beyond a certain cutoff frequency v, but would let through all 
frequencies below v, without any change of amplitude or phase. The 
nearest solution of this problem can be given by the interpolatory 
treatment and subsequent elimination of the Gibbs oscillations by 
o smoothing. The pulse response associated with this filter comes 
out as follows 


RGt,) = sin 2rv,t, cos (wt,/T>) (4l <TD 
Ont, (4-21.15) 
= 0 (| t, | > TD 


In all our discussions we were concerned with the function (v) 
and not with the original function (v). The relation between these 
two functions is given by (4). By definition the significance of $(7) is 
given as follows: 


Input: JÀ = e7” 
Output: g(t) = d(r)e?™ 
= fb (v)e2nt- T02) 


We see that the output constructed on the basis of ¢,(v) lags behind 
the input by the constant time delay 7/2. Hence the freedom we have 
in constructing filters of prescribed characteristics is restricted by the 
fact that there is an inevitable constant time lag, equal to one-half 
of the memory time, associated with the use of the filter. In many 
communication problems this time lag is of no further consequence. 
But if we cannot permit an adequate time lag and have to cut down 
T, to a too insignificant amount, we gradually lose accuracy in 
realizing the filter characteristics, because the points at which ¢(7) 
can be prescribed grow too far apart—on account of the largeness of 
the basic frequency v = 1/T,—and we miss important details of the 
desired curve. 


22. Search for hidden periodicities. In certain meteorological 
and astronomical problems, in analysis of tides, and in all situations 
in which hidden periodicities are suspected, the following mathe- 
matical problem is encountered. We know that a certain function 
f(t) is resolvable into components which are strictly periodic, 


268 Harmonic Analysis Chap. IV 


although the periodicities are generally not in a harmonic ratio to 
each other. We want to obtain the unknown frequencies and ampli- 
tudes of each of the components. What we have at our disposal are 
a large number of observations, taken at equidistant time intervals. 
Our conclusions have to be drawn from the information conveyed 
to us by these data. 

We will assume that the total number of observational data is the 
odd number 2N + 1. Moreover, we will put the time moment t = 0 
in the mid-point of our data. Finally, if our observations were made 
in intervals of 7, we will rescale the original variable ¢, to a new 
variable 


1 
7. (4-22.1) 
T 


The observations now belong to the time moments 


t =0, +1, +2, |, +N (4-22.2) 
The readings shall be denoted by 
fe =f (ty) (4-22.3) 


The function f(t) is generally of the following form: 
j 
f = > (4. cos 6,¢ + B, sin 6,¢) (4-22.4) 
«=l 


The number of periodic components, denoted by j, is usually not 
known in advance. Nor can we say anything about the range of the 
angular frequencies w, = 0,/7. However, we can say in advance that 
in the new variable ¢ the angular frequencies 0, have to be restricted 
by the inequality 


0, <r (4-22.5) 


because, if 0, surpasses this limit, two frequencies m + f and m — B 
cannot be distinguished. We will put 


ERNA 4-22.6 
a = 7 Pa ( ) 


and let p, range between 0 and N. 


§ 22 Search for Hidden Periodicities 269 


We separate the sine and the cosine functions by forming the sums 
and differences 


f(t) +f(—) =2 S A, cos 0,t 
_ (4-22.7) 


f() —f(t) =2 >. B, sin 6,1 
a=l1 


Correspondingly our ordinates are separated into two groups: 


Uy = fe + fox 
Ve = fer 4 


We will employ the method of the Fourier transform (cf. 17.8) but 
adapted to summation instead of integration. We transform the 
original set of u, data into a new set of N + 1 cosine amplitudes a, 
and the v, data into a set of N — 1 sine amplitudes b,. These 
amplitudes are obtained by multiplying the basic data by a prescribed 
matrix, containing the cosines and sines of the multiples of 7/N. 


(k =0,1,2,-,N) (4-22.8) 


N 
4 T 
a, = > Ua, COS = ak 
a=0 
N-1 


b, = > Ve sin — ak 


a=] 


(4-22.9) 


The symbol =" refers to the fact that the first and the last functional 
data u and uy enter the sum with half weight only. These a, and b, 
can be plotted as a “line spectrum,” belonging to integer values 
p= « of the continuous parameter p. The entire further analysis 
will be based on these two new sequences. 

Let us first consider the case that the given frequencies 6,, are such 
that all the p, of the relation (6) become integers. Then the amplitudes 
(9) give us directly the solution of our problem. Most of the a, and b, 
will be zero. If a certain a, or b, 1s not zero, this indicates that the 
frequency 


@ = — 59. 
ah (4-22.10) 


270 Harmonic Analysis Chap. IV 


is present in our data. The associated cosine amplitude becomes 


1 
A= ye (4-22.11) 
while the associated sine amplitude becomes 
1 
B= 7 b, (4-22.12) 


In the general case the exact p value associated with a certain 
frequency 8, will lie between two integers m and m + 1, recognizable 
by the fact that the regular + pattern of the amplitudes a, and 6, 
is occasionally interrupted by a ++ or —— pattern, We now put 


p=m+e (4-22.13) 


and obtain € by interpolation. For this purpose we use the “‘second 
sum” method (cf. 3-5.28), described in IJI, 5. By this method the 
mutual interference of peaks is greatly diminished. The ratio q 
of two neighboring second sums is formed (cf. 3-5.30) and e is 
obtained on the basis of (3-5.31). An excellent check on the accuracy 
of our observations is provided by the fact that the position of the 
peaks of both the a, and the b, amplitudes must be the same. Hence 
we have two independent determinations of 0,, once using the a, 
and once the b. 

After obtaining € we now possess the frequency 0, due to the 
relation 


0, = x; (m + 8) (4-22.14) 


Moreover, the amplitudes A,, B, associated with this frequency can 
be calculated on the basis of the Fourier amplitudes a,, and bn- 
Am TE bi, TE 


A, = — ? B, = — 4-22.15 
* N sin are * N sin me ( ) 


This method of isolating periodic components operates very 
satisfactorily if the total number of observations 2N + 1 is sufficiently 
large. It is necessary that two p, values of the relation (6) shall be 
separated by at least 4 units in order to isolate two neighboring peaks. 
If they move nearer together, their mutual interference increases and 
it becomes gradually more difficult to separate them properly. 


§ 22 Search for Hidden Periodicities 271 


If we cannot count on a sufficiently large value of N and the peaks 
move closer together, an altogether different approach is advocated. 
As we have seen earlier (cf. § 21), the focusing power of the function 
(2.8) can be greatly increased by the method of the o smoothing. In 
our present problem this method amounts to a modification of the 
basic data u, and v, which appear in the Fourier sums (9). Before 
forming the Fourier sums we modify the given data by applying to 
them a properly chosen weight factor, according to the following 
definitions: 


Uy; = UkO ps Dy = VO k (4-22.16) 
where 
sin (k7/N) 
Og = kenj|N (4-22.17) 


The a,,b, of (9) are now calculated with the help of these new u,,v,. 
The result of this modification is that the function (sin 7x)/zzx is 
replaced by the function 


S(2) = = [Sie + m) — Sif — r) (4-22.18) 


The secondary peaks of this function are 4.8, 2.0, 1.1%, + of the 
central peak compared with 21.7, 12.8, 9.1%, = of the previous 
function (cf. Figure in § 7). This demonstrates the stronger focusing 
power of the new function, and the practical independence of the new 
maxima. At the same time the peaks are now somewhat broader 
than they were before. But this broadening of the lines has the 
beneficial effect that a parabola of second order can be laid through 
the maximum amplitude and two of its neighbors, obtaining the 
position and magnitude of the true maximum by this parabolic 
interpolation. We operate with the maximum amplitude a,, and its 
left and right neighbor a,,_1,4,,,;. Then the € of the formulas (13) 
and (14) becomes 


e — 1 Am+1 — Am-1 
2 2a m Eg (am+1 T Am1) 


while the maximum ordinate a,, becomes 


(4-22.19) 


E 
a, = am + 4 (am+1 na am-1) (4-22.20) 


272 Harmonic Analysis Chap. IV 
Then 


As = 1.6963 (4-22.21) 


(The numerical factor represents the reciprocal of S(0)). The same 


calculation holds for the sine amplitude B,, replacing the a, by the 
by! 


23. Separation of exponentials. In the previous section the 
question was discussed of analyzing a function which is composed of 
periodic components. In radioactive decay measurements a similar 
problem arises, but here the periodic functions are replaced by 
exponential functions. Given is a function of the following form. 


f(a) = Aye” + Ag p ee + Ame (4-23.1) 


Our aim is to find the “decay constants” å, and the amplitudes 4,. 
On the surface the problem is very similar to the previous one: the 
frequency œw; has changed to an imaginary frequency iw;. But in 
actual fact the two problems are far apart because the exponential 
functions in no way display the remarkable orthogonality properties 
of periodic functions. For this reason it is not enough to discuss the 
purely mathematical solution of the problem but we have to go into 
its numerical aspects as well. We will first deal with the theoretical 
side of the problem and then come to its numerical implications. 

We know that an ordinary differential equation with constant 
coefficients has as its solution a linear combination of exponentials. 
The solution is thus of the form (1). The same is true of a difference 
equation with constant coefficients in which the operation d/dx is 
replaced by the operation A/Az. Generally, let us assume that the 
function y = f(x) has the property that there exists a definite linear 
relation between m + 1 equidistant ordinates: 


ofa) + af E+ h) + cafe + 2h) ++ + en fle + mh) = 0 
(4-23.2) 
The solution of this functional equation is 
f(x) = Aet + Aget + + A,e*m® (4-23.3) 


1 The numerical schemes discussed in this section are much more accurate 
than the traditional “periodogram” methods which fail to recognize the funda- 
mental connection between the problem of hidden periodicities and the theory of 
the Fourier transform. 


§ 23 Separation of Exponentials 273 


Substitution in (2) shows that the exponents å, are obtainable by 
the following method. We put 


erie, (4-23.4) 
and obtain an algebraic equation for the determination of the é,: 
Co + GÉ + Cé? + + Cm” = 0 (4-23.5) 


The m roots of this equation are é = &,, &5,°, Em. Then, taking the 
natural logarithms, 


A, = — ; log £; (4-23.6) 


The smallest number of ordinates we must have at our disposal 
is 2m since both the exponents A, and the amplitudes A, are unknown. 
Let these ordinates be denoted by 4, Yo, °°, Yom. We now obtain 
the c, by solving the following set of linear equations (we normalize 
the highest coefficient c,, to 1): 


Yo =r Yoty ET E T YmEm-1 T Ym+ = 0 
Y2Co =F Y3C1 T ~ oa Ymtiom-1 =P Ym+2 = 0 (4-23.7) 


Ymo T Ym+1l1 F A T Y2m-1Cm-1 T Yom >= 0 


In actual fact we will not choose neighboring data for y,, yo, ° , but 
will try to make / as large as possible in order to reduce the effect of 
observational errors. If for example our task is to separate m = 4 
exponentials and we have 40 equidistant data at our disposal, we will 
divide these data into 2m = 8 groups of 5 consecutive observations 
each. The sum of the 5 ordinates in each group provides us with 8 
new ordinates 7, which will be used as the y, of the system (7). Then 
the A of the equations (2) and (6) is not the time interval Az of two 
neighboring measurements, but 5 times that interval. 

Let us assume that we have found the solution of the system (7), 
then formed the algebraic equation (5) and found its roots. Finally 
we have obtained the å, according to (6). We now come to the second 
half of our problem, viz., the determination of the amplitudes A,. 
For this purpose only m data are needed. Let us assume that we use 
the data belonging to the time moments 


2=0, k, 2h, =, (m=) 


274 Harmonic Analysis Chap. IV 


These data shall be denoted by Yọ, Y1, * , ¥m-1- The equations to be 
solved are now 


A, + A, eA = Yo 
AıPı + Aapa +H +H AmPm = 
AR = + Ape +e tAn =h (4-23.8) 


Api + Aap + + AmPm = Yma 
if we put 

p=e ; pame, =, Pm = eint (4-23.9) 
The problem (8) is known as the “problem of weighted moments.” 
It is a fundamental problem of applied analysis which appears in 
numerous combinations (cf. e.g., VI, 13). It is solvable by a parti- 
cularly simple and elegant numerical algorithm. We first construct 
the “fundamental polynomial” 


Falp) = (P — PYE — Po) | (P — Pm) 
= fo FAP +0 + Sfmap™ + p” 

Then a second polynomial G,,_,(p) is constructed by multiplying 
Fm(p) by a polynomial which proceeds in reciprocal powers of p: 

o FAP + + PUP + np” + + Ymap™) 
The numerical scheme is similar to our ordinary longhand multiplica- 
tion scheme of two decimal numbers. We omit the powers of p and 
write down only the coefficients fi, fo, fs, °°, (they correspond to the 
digits of a decimal number). They are in succession multiplied by yọ. 
Then we multiply similarly by y, but we indent one place to the left. 
Then we multiply equally by yz, again indenting one place to the left. 
Thus we continue until y,,_, is reached. All elements corresponding 
to negative powers of p are omitted. For this reason the scheme does 
not extend to the left beyond the first column. Hence the second row 
contains only m — 1 elements, the third row only m — 2 elements, + , 
the last row only 1 element. Finally (just as in ordinary multiplica- 
tion), the sum of each column is formed: 


fito fo D f, m-1¥0> Yo 
fh Sev "> A 


—A,h’ 


(4-23.10) 


: (4-23.11) 
Í: m—1Ym-2 Ym-2 
Ym-1 


Sum: 8o» 8&1» a Em-2 §&m-1 


§ 23 Separation of Exponentials 275 
This yields the polynomial 
Gma(P) = go + 8P + Sap? + + Bmap™” (4-23.12) 


Now the unknowns A, of our problem (8) are obtained by a simple 
substitution scheme. We substitute p = p; in G,,_,(p) and we do 
likewise in the derivative of F,,(p). The ratio of these two numbers 
gives A,: 

Te G m-1P:) 
' Falp) 

Another method of obtaining the solution of (8) is to construct the 
inverse of the matrix of the linear system (8). If the elements of the 
inverse are denoted by q;,, we obtain 


Frap) (i, k =1,2, =, m) (4-23.14) 


1a Fap) 
where the polynomial F,,_,(p) is formed with the help of the last 
elements of the fundamental polynomial F,,(p), e.g., 


Fp) = fm-2 + fm-3P + P 


The successive numerators of q;, (for fixed i) can be obtained also by 
synthetic division (cf. I, 8). If F,,(p) is divided by the root factor 
P — Pi, the successive coefficients of the division scheme yield 
F (pi), Fpi); °° » Fmi). 

This simple and straightforward mathematical solution of the 
separation problem would hardly indicate what enormous practical 
difficulties arise if we try to apply it to physical problems. The 
difficulty is caused by the fact that the solution of the equations (7) 
for the coefficients c, succeeds only if the data are given with excessive 
accuracy. If the separation of four or five exponentials is demanded, 
the associated linear system (7) becomes so strongly skew-angular 
that an accuracy of 6 to 8 significant figures would be needed in the 
Ya for their successful solution. Such an accuracy is completely 
unrealistic if compared with the actual accuracy of decay measure- 
ments. Even the separation of three exponentials might encounter 
already unsurmountable difficulties. The following example is well 
suited to demonstrate the surprising numerical snags which may 
develop on account of the exceedingly nonorthogonal behavior of 
exponential functions. 


(4-23.13) 


276 Harmonic Analysis Chap. IV 


The following set of 24 decay observations were obtained in time 
intervals of 3 minutes, i.e., 0.05 hour, if the hour is accepted as the 
unit of time in our problem; hence Az = 0.05, starting with the 
time moment x = 0: 


2.51 7 10.77 13 |0.27 19 |0.11 


The observations are considered as accurate to 4 unit of the second 
decimal. 

We do not know how many exponentials are present in our data. 
We first try a separation in three exponentials. Hence we divide our 
data into 6 groups, taking in each group the sum of four consecutive 
ordinates: k = 1 to 4, 5 to 8, etc. The new data become (omitting 
the decimal point which for our present purposes is irrelevant): 


759, 346, 168, 87, 49, 30 
Hence the equations for the determination of the c; become 


759c, + 346c, + 168c, + 87 =0 
346c, + 168c, + 87ca + 49 =0 
168c, + 874+ 49c, + 30=0 


Dividing by the factor of cy we get 


Cy + 0.4559c, + 0.2213c + 0.1146 = 0 
Cy + 0.4855c, + 0.2514c, + 0.1416 = 0 
Cy + 0.5179c, + 0.2917c + 0.1786 = 0 


By subtraction the following two equations result for the determina- 
tion of c and c3: 


0.0296c, + 0.0301c, + 0.0270 = 0 


0.0324¢, + 0.0402c, + 0.0370 = 0 (4-23.15) 


If the second equation is multiplied by 0.75, it becomes 
0.0242c, + 0.0302c, + 0.0277 = 0 


§ 23 Separation of Exponentials 277 


Now the accuracy of our observations is such that we cannot 
guarantee the accuracy of these coefficients with more than 1 unit in 
the second decimal. But this shows that the two equations (15) are 
redundant within the errors of our observations. Under no circum- 
stances can they serve for an independent determination of c, and c3. 
This shows that our measurements will be describable by only two 
exponentials. Hence we divide our data now into four groups of 
6 ordinates each and form the sum within each group. This gives the 
new data 
964, 309, 115, 51 


and we obtain thé two equations 


964c + 309c, + 115 = 0 
with the solution 


Co = 0.1640, = cy = — 0.8839 
This yields the quadratic equation 
2 — 0.8839 + 0.1640 = 0 
which has the two roots 
&,=0.619, &,=0.265 


The A of the formula (6) is now 6° 0.05 = 0.3. The two exponents 
A, and A, thus become 


A, = 4.45, A, = 1.58 


We now come to the determination of the two amplitudes A, and Ag. 
In principle two observations are sufficient for this purpose. We 
will use, however, the abundance of our data for checking up on the 
constancy of A;. We can conceive the solution of the linear set (8) 
in the following light. We choose a linear combination of m equi- 
distant data which reduces the weight factors of all the A, to zero, 
with the exception of the single weight A,. Now if this linear combina- 
tion is systematically applied in succession to the entire table of data, 
we succeed in isolating the single exponential function A,e~“*. For 
example in our problem we may combine the data k = 1 and 5 for 
the annihilation of A>. But then we can take the same linear combina- 
tion of the data k = 2 to 6, then 3 and 7, and so on. We thus obtain 


278 Harmonic Analysis Chap. IV 


the successive values of the function A,e~*. If now we multiply by 
e^t, we should get a constant, except for the inevitable scatter caused 
by the inaccuracy of the data. If we can detect a linear trend in this 
series, we will lay a least-square straight line through the set, of the 
form 


B 
A + Bx = A(1 + Bz), pa 


which in view of the smallness of 8 may be written in the form Ae’? 
and used as a correction of the previous value of 4: 
Aer gut = Ae ahs 


Carrying through this process in our example, we find that no 
linear trend can be detected beyond the natural scatter of the results. 
For example the first 12 consecutive values for the determination of 
A apart from a constant factor which can be applied later, become: 

887, 870, 872, 842, 872, 867, 861, 869, 862, 908, 890, 914 
(Here we stopped, since the magnification factor due to multiplica- 
tion by e^" is here already 14.4). By taking the arithmetic mean of 
these values we obtain 876.1 which leads to 


A, = 2.202 


Similar is the situation concerning A,. Here too a consecutive set 
of 16 values for A, failed to show a linear trend. The arithmetic mean 
gave 


A, = 0.305 
and thus the final result becomes 
fe) = 2,202e 74:45% + 0.305e 71587 (4-23.16) 


Here we have the mathematical representation of our data with the 
help of two exponentials. As a check we now generate this f(x) at 
all the 24 data points x = 0, 0.05, 0.1, 0.15, + , 1.15, obtaining the 
following table: 


k | Yr k | Ye k | Yk k | Yr 
2.507 7 | 0.769 13 | 0.270 19 | 0.114 
2.044 8 | 0.639 14 | 0.230 20 | 0.100 

i 9 | 0.533 15 | 0.197 21 | 0.088 


Nua kh wb = 
eo 
U3 
~ 
(>) 
ak 
© 
> 
È 
jesh 
aN 
© 
paai 
~] 
lad 
N 
N 
© 
Q 
~] 
O 


§ 23 Separation of Exponentials 279 


If we compare this table with the original table of our data, we 
notice that the deviation is never larger than 0.005, except in the 
single instance of k = 5, where the error reaches the magnitude 
0.006. We can characterize the “‘average deviation” by taking the 
square root of the following quantity: the sum of the squares of the 
individual deviations, divided by 24. What we get is only 0.0026, 
which is well within the error limits of our data. Hence we can con- 
sider the law (16) as a perfectly satisfactory representation of our 
data. 

In actual fact we have failed dismally in our task, since the given 
data were constructed on the basis of the mathematical law 


f(x) = 0.095le—* + 0.8607e—** + 1.5576e—** (4-23.17) 


Not only did we lose the relatively weakly represented exponent, but 
the presence of this component has a contaminating influence on the 
other two exponents by reducing the exponent 3 to 1.58 and the 
exponent 5 to 4.45. Moreover, the approximate ratio 1 : 2 of the 
amplitudes was distorted to 1: 7. Our result is thus completely 
unsatisfactory. And yet our solution is “numerically equivalent” to 
the true solution since it gives a perfectly satisfactory fit of the data 
within the experimental errors. It would be idle to hope that some 
other modified mathematical procedure could give better results, 
since the difficulty lies not with the manner of evaluation but with 
the extraordinary sensitivity of the exponents and amplitudes to very 
small changes of the data, which no amount of least-square or other 
form of statistics could remedy. The only remedy would be an 
increase of accuracy to limits which are far beyond the possibilities 
of our present measuring devices. 

Under these circumstances we will aim at a more modest task. 
Let us assume that we have some preliminary information about the 
Ai, because of which we know the number of exponential components 
and an approximate value of each exponent. Our problem is now 
reduced to the determination of the A,, together with a correction of 
the given A,. 

Again we will proceed in a manner similar to the last phase of our 
previous problem. On the basis of the given 4; we construct such a 
linear combination of the data which isolates the single exponential 
function A,e~**. By applying the same linear combination to a 
homologous set of data we will obtain a series of let us say 2n 


280 Harmonic Analysis Chap. IV 


equidistant values of this function. We divide the sum of the first 
n values by the sum of the second n values, which gives 


e dz 


from which 4, can be determined. We now have one corrected /,, and 
we use this new value of å plus the previously given å to carry out the 
correction scheme for another of the A,. Thus we continue, taking into 
account at every new step the corrected values of the exponents. 
Finally we have corrected all the A; and then we come to the deter- 
mination of the A; in a similar manner as we have done before. 

Let us apply this procedure to our previous example. The given 
values of the exponents shall be 


A, = 1.2 (instead of 1), A, = 2.7 (instead of 3), A, = 6 (instead of 5) 
The result of our calculations is as follows. We obtain 
f@= 0.041e~ 9-59 4. 0.79e—?-78% 4. 1.68 e— £962 (4-23.18) 


Compared with the correct expression (17) the result is again dis- 
appointing. But again we get a perfectly satisfactory fit of the given 
data within the accuracy of our measurements. This example shows 
that we cannot expect satisfactory results in the separation problem 
of exponentials even if we know in advance the approximate values 
of the decay constants and we have merely to refine these values. On 
the other hand, the picture would have been quite different if our 
data had had a ten times greater accuracy. 


24. The Laplace transform. Closely related to the theory of the 
Fourier transform is a somewhat different transform which plays a 
superior role in many problems of mathematical physics and is of 
particular usefulness in problems of network analysis. This is the 
so-called “Laplace transform,” defined by 


L(z) = f “ety (x) dx (4-24,1) 


If imaginary values are assigned to z, then (1) assumes the form 
(17.16) which defined the Fourier transform of a function f(x). This 
function is now specified to exist only between 0 and œ, while for 
negative values we put f(z) = 0. However, the new lower limit 0 
instead of —oo has a profound effect on the analytical nature of 


§ 24 The Laplace Transform 281 


£(z). While the previous F(v) (cf. 17.16) could generally be defined 
only for real values of v, the new transform is an analytical function 
of the complex variable z throughout the complex half plane 
A(z) > 0, provided that f(x) is an absolutely integrable function of 
x in the infinite interval (0,00). Even if f(x) does not decrease to 
infinity fast enough to be absolutely integrable, it is possible that 
f(x)e~® is already integrable, with an arbitrarily small e. Then we 
can put 


z=e+2, f(x) =f(x)e (4-24,2) 


and consider the Laplace transform of f(x). We then lose for our 
original variable z a small strip of the width € next to the imaginary 
axis. Since, however, € can be made arbitrarily small, the analytical 
nature of £(z) everywhere to the right of the imaginary axis is still 
guaranteed. For example, for f(x) = 1 we obtain 


[e 6) 


L£(z) = I e dx = s (4-24.3) 


This function of z, originally defined only for 2(z) > 0, can now by 
analytical continuation be extended to the entire complex plane, with 
the only exception of the singular point z = 0. 

By a simultaneous change of the signs of x and z we can define a 
Laplace transform L£,(z) for the negative half plane 


£(2) = Í i 


e-**f,(x) dx (4-24.4) 


and then combine £(z) + L£5(z) to one single function. The common 
boundary of the two definitions is the imaginary axis, where we now 
get 


Lliw) + Lliw) = iN e tafa) dx (4-24.5) 


This relation shows that the Fourier transform of an absolutely 
integrable function of the range [—oo, +00] can be conceived as the 
sum of two Laplace transforms, taken along the imaginary axis, one 
analytical in the right, the other analytical in the left half plane. 

There is no point-to-point relation between the function f(x) and. 
the Laplace transform £(z). The value of L(2) at any point depends 


282 Harmonic Analysis Chap. IV 


on the totality of values f(x). There is one exception, however. Let 
us assume that f(x) is analytical around the origin z = 0. Further- 
more, let us assign a large positive real value to z. Then the smallness 
of e~** for any x which is appreciably different from zero reduces the 
realm of integration to a very small neighborhood of x = 0. Under 
these circumstances we obtain 


N fee dx =f of” edz -LO o (4-24.6) 


More generally, expanding f(x) around x = 0 into a convergent 
Taylor series: 


f£) = Cy + cx + ca? + ee (4-24.7) 


and integrating term by term we obtain for the corresponding 
Laplace transform, for large values of z (assuming that the real part 
of z is positive): 


L(2) = — 243 ;+— 


ats se 


4+ 34. (4-24.8) 


Hence there is a point-to-point correlation between the infinitesimal 
neighborhood of the point x = 0 of f(x) and the infinitesimal 
neighborhood of the point z = œ of the Laplace transform. 
However, the series (8) need not converge for any value of z. 


25. Network analysis and Laplace transform. Application of the 
Laplace transform to the equations of electric networks opened a 
new perspective which was anticipated by Heaviside’s operational 
methods. The justification of Heaviside’s ingenious intuitions was 
given when the discovery was made that the Laplace transform of the 
input and output functions of electric networks automatically 
satisfied the algebraic equations which Heaviside postulated in a less 
rigorous manner. A simultaneous set of ordinary differential equa- 
tions with constant coefficients is transformed into a set of simple 
linear algebraic equations which connects the Laplace transforms of 
the original unknowns. If the block diagram of the electric network 
is given, we can solve the input-output relation of electric networks 
by solving a set of simultaneous linear equations. The solution gives 
the input-output relation in the form of a ratio of two algebraic 
polynomials p(z) and q(z). The fundamental quantity is here the 


§ 25 Network Analysis and Laplace Transform 283 


pulse response of the network. The Laplace transform of the unit 
pulse becomes 1. The Laplace transform of the output function—i.e., 
of the pulse response—becomes 


LKD = { “et K(t) dt (4-25.1) 


It is this £(z) which is directly derivable from the given block 
diagram of the network: 
L(2) 2 (4-25.2) 
q(2) 
where the order of ¢(z) is at least one higher than the order of p(z). 
The first physical quantity which is directly at our disposal is the 
frequency response of the network. The frequency response (w), 
expressed in terms of the angular frequency (cf. 17.19), is obtained by 
replacing z by iw. 


p(w) = Lliw) (4-25.3) 


The frequency response is thus available without any integration, 
purely on the basis of knowing the elements of the network and their 
interconnection. The pulse response, on the other hand, is the result 
of a Fourier transform (cf. 18.10): 


O ; 
K(t) = z7 { L(iw)e dw (4-25.4) 
TJ- 
However, this integration—as it was observed by Heaviside—need 


not be carried out because we can resolve the ratio (2) into partial 
fractions. 


VOMA 
o~ 2 = (4-25.5) 
Here A, are the roots of the denominator, while 
a, = PA) 
q (A;) 


Since we know by elementary integration that 


= l 
ġ—zt EEEN 
Í eM dt = —— (4-25.6) 


2 


284 Harmonic Analysis Chap. IV 


we find 
n 
K(t)= > ag (4-25.7) 
2 


This is Heaviside’s celebrated “expansion theorem” (in slight 
modification, since Heaviside was interested in the step function 
response which is the integral of the pulse response). 


26. Inversion of the Laplace transform. Although Heaviside’s 
expansion theorem solves the problem of the transient response of 
electric networks, yet in actual practice we frequently prefer other 
solutions which circumvent finding the generally complex roots of an 
algebraic equation. Infact, in the case of complicated block diagrams 
encountered in servo-problems of the guided missile type, even the 
construction of the polynomials p(z), q(z) as actual algebraic operators 
may encounter insuperable practical difficulties, although it may not be 
too difficult to obtain the numerical value of L£(z) for any prescribed 
value of z, Hence we can ask the following question, and it is a 
question which is encountered in many other physical situations 
beside the case of electric networks: Given the Laplace transform of 
a function f(t) (the “‘indicial function”), at certain suitably chosen 
points of the complex plane, find the original (indicial) function f(t). 

Since the range of ż is infinite, the scale in which t is measured is in 
principle arbitrary. In actual fact the proper scaling of t is of utmost 
importance. The original scale in which ¢ is measured may be 
entirely inadequate to our problem. In network problems K(¢) has 
the significance of the pulse response of the network. In view of the 
practically finite “memory time” T of the network, A(t) is of 
interest only up to a certain 7, but generally this may not be known in 
advance. If the time unit is chosen as too large, the essential portion 
of K(t) is crowded into a very small interval around t = 0. If, on the 
other hand, the time unit is chosen as too small, the essential portion 
of K(t) spreads out too far. In both cases the convergence of the 
expansions to be studied in the next sections will suffer. 

The memory time of the network could well be chosen as a natural 
unit of time. But in the absence of this knowledge we can still 
introduce a reasonable, although not necessarily safe normalization 
of the time scale. Even without detailed investigation of the dis- 
tribution of the roots of the denominator g(x) we can say something 


§ 27 Inversion by Legendre Polynomials 285 


about their general location. Let the factor of z” of g(z) be normalized 
to 1. Then the factor q„—ı of the power z”! has the following 
significance. It gives the negative sum of all the roots. Hence 


Gna tht th, 


(4-26.1) 
n n 


gives the center of mass of all the generally complex roots of the 
denominator (i.e., of the characteristic frequencies of the network). 
Since the A; must all have a negative real part in order to make the 
network stable, the coefficient g,,_, must be positive. We can assign 
any value to q„—ı by the scale transformation 


i= O42 (4-26.2) 


where we have denoted with z, the original inadequately normalized 
variable of the Laplace transform, while z denotes the final variable 
with which we want to operate. We will choose the scale factor « by 
the condition that the new q,_, shall become n. This means that the 
new time unit is chosen in such a way that the center of mass of the 
characteristic vibrations becomes —1. The transformation (2) of z 
is parallelled by the transformation of the time scale.’ 


t, = tho (4-26.3) 


We will consider four different solutions of the inversion problem. 
According to the given circumstances we may prefer, for numerical 
purposes, one or the other of these solutions. But each of these 
solutions has its own mathematical merits. 


27. Inversion by Legendre polynomials. Our first solution utilizes 
equidistant real values of the variable z. We make the transformation 


eta E (4-27.1) 


The merit of this transformation is that it transforms the infinite 
range [0,00] of ¢ into the finite range [0,1] of £. We consider now 


1 A mere substitution of the form (2) in °(z,) assumes that the Laplace trans- 
form is an invariant of a linear transformation. Actually the transformation of 
dt, requires that (2) should be divided by «. Since, however, in network prob- 
lems f(t) has the significance of the unit pulse, and this pulse changes by the 
same factor because of the scale transformation, we are justified in the assump- 
tion that the Laplace transform is an invariant of the scale transformation (2). 


286 Harmonic Analysis Chap. IV 


f(t) as a function of the new variable £—we indicate that by writing 
f(€&—and obtain the Laplace transform in the new form 


L(2) = Í SOE dé (4-27.2) 
We will consider the equidistant set of points 
z= 1,.2,.3,4¢ (4-27.3) 
Then we possess the “moments” of /(€). 
Y, = Lk + 1) = frora (4-27.4) 
Moreover, if p,(£) is any polynomial of £, 


PaE) = Pa + PRE + + pre" (4-27.5) 


and we weight the y, by the coefficients of this polynomial, 


Crn = D varh (4-27.6) 


we obtain the definite integral 


Cn = [ ror. dé (4-27.7) 
0 


We will use the operational notation p,(y) in the following sense. 
We write out the polynomial p,(y) and replace y* by y,. With this 
convention, formula (7) may be written in the operational form, 


Cn = Pal) (4-27.8) 


We will now introduce the Legendre polynomials P,(x), [cf. 
(5-20.11)], but normalizing their range of orthogonality to the range 
[0,1], instead of the customary range [—1,1]. These polynomials are 
now directly expressible in terms of the hypergeometric function. 


Pi(x) = F(—k, k + 1,1; 2) (4-27.9) 


§ 27 Inversion by Legendre Polynomials 287 
Their “norm” is 
1 


1 
= #2 em 
N= | Pi (w) dx aol 


(4-27.10) 


We can expand our /f(é) in these polynomials [cf. (5-16.9)] 


SE = > Ok + NaP (4-27.11) 
with =o 
1 
ce= | SOPO dt = PE) (4-27.12) 


Hence we have obtained an explicit representation of f(&) in the 
form of an infinite convergent expansion, whose coefficients are 
calculable in terms of y,, i.e., in terms of equidistant values of the 
Laplace transform along the real axis. 

The coefficients of P7() possess the numerically valuable property 
that they are all integers; they can be pretabulated (cf. Table IV), in 
the form of a triangular matrix. Multiplication of the y, with this 
matrix provides us with the coefficients c, which, after multiplication 
by (2k + 1), give the coefficients of the expansion (12). Un- 
fortunately, the elements of this matrix increase very rapidly. This 
has the consequence that the values y, have to be given with excessive 
accuracy, in order to evaluate the c, with even moderate accuracy. 
At k = 10 the matrix elements increase up to 2+ 10°. Hence it is 
unsafe to go beyond k = 10, even if the ten-digit accuracy of the 
customary desk-machines is fully utilized. The accuracy with which 
the y, have to be given is here already 10 significant figures. Mere 
measurements can never cope with such an accuracy, and thus it is 
demonstrated that physical observations of the Laplace transform can 
never solve the problem of restoring the indicial function with any 
degree of accuracy. The relation between the data and the original 
function is excessively nonorthogonal, and the slightest “‘noise’’ of 
the data destroys all hope of inverting an empirically given Laplace 
transform. 

Here we observe a repetition of the problem encountered in the 
resolution of a function into a set of exponentials (cf. § 23), and the 
example is more than accidental, since a function of this type can 
actually be conceived as a Laplace transform. 


288 Harmonic Analysis Chap. IV 


The situation is quite different, however, if L(z) is a theoretically 
given quantity, such as the ratio of two polynomials which character- 
izes electric network problems: 


LD = POl 


In order to obtain the y, we have to substitute the successive integers 
1, 2, 3, + in two polynomials, which is a quick and simple process. 
The coefficients of these polynomials should not be given with more 
than a minimum number of significant figures (2 or 3). But in all 
subsequent calculations, full ten-place accuracy is required. In fact, 
if we want to go beyond 11 terms of the expansion (beyond n = 10), 
it is necessary that the basic data shall be given with 15 decimal places. 
This is by no means difficult since p(k), g(k), substituting the sub- 
sequent integers, will seldom go beyond 10 significant figures, even if 
absolute accuracy is used. Hence is is necessary only to perform the 
division with 15 significant figures, which requires that we shall put 
the remainder of the first division process back in the machine and 
repeat the division with this new numerator, with no change of the 
denominator. Moreover, the coefficients of Pz (x) do not grow beyond 
10 decimal places up to n = 14, and thus require no double precision. 
An expansion of more than 15 terms will seldom be demanded if the 
scale factor « was chosen properly. 


28. Inversion by Chebyshev polynomials. Although the Legendre 
polynomials P,(x) are tabulated,’ a more convenient expansion is 
obtainable in the form of a Fourier series. We introduce the angle 
variable 0 by putting 

; 1 + cos 0 
et = E = — = € 


0 
2 Td 
5 Os 5 (4-28.1) 


We will consider f(£) a function of the angle variable 6—range 
[0,77]—and expand it in a Fourier sine series. 


fO = ad Me ee E O eee (4-28.2) 
vig 


For this purpose the boundary conditions f (0) = f(m) = 0 have to 
be satisfied. Now the point 0 = 7 corresponds to t = œ, and there 
the pulse response is automatically zero. On the other hand, at the 


1 Cf. Tables of the Legendre Polynomials (Macmillan, New York, 1946). 


§ 28 Inversion by Chebyshev Polynomials 289 


point 0 = 0, i.e., t = 0, the function is generally not zero. We can 
make it zero, however, by a simple artifice. If L£(z) is an infinity of the 
form p,/z (cf. 24.8), we will put 

AA =fO — pe (4-28.3) 


and 


= 4-28.4 
Lo = £@) — (4-28.4) 
Then £(z) decreases to infinity with 2—*, and f(t) is zero at t = 0. 
The natural boundary conditions of the expansion (16) are now 
satisfied, and the convergence of the series will be satisfactory. 

The coefficients b, of the series (6) are determined in the standard 


manner. 


=i f(O) sin k0 dd = — f O ae ae (4-28.5) 


Now the function (sin k6)/sin 0 conceived as a function of Å is a 
polynomial, called “Chebyshev polynomial of the second kind” 
(cf. III, 4 and V, 20). 


sin k0 


sing =U; z—-1(6) = Ti (x) (4-28.6) 


The coefficients of these polynomials are once more integers and once 
more a direct relation to the hypergeometric series can be established. 


Ur- a(&) = (—1)* KK + 1, —k + 1,2; £) (4-28.7) 
Application of (27.7-8) to our case gives 


b, = U; -1Y) (4-28.8) 


The coefficients of the polynomials can be pretabulated (cf. Table IX). 
Again they form a triangular matrix. Multiplication of the basic 
values y, with this matrix again generates the coefficients of an 
orthogonal expansion, in analogy to the previous solution, but with 
the added advantage that the resulting series can be written as a 
Fourier sine series in the angle variable 0. Once more extreme 
numerical accuracy has to be maintained and we cannot come beyond 
n = 10 if the y, are not given with more than 10-place accuracy. The 
10-place stage for the coefficients of U,_,(&) is reached at k = 14. 


290 Harmonic Analysis Chap. IV 


29. Inversion by Fourier series. The strongly nonorthogonal 
behavior of equidistant points on the real axis of £(z) changes to 
an entirely different behavior, if equidistant points of the imaginary 
axis are considered. We will make use of the fact that in com- 
munication problems the indicial function (the pulse response) 
is practically limited to a finite interval, since beyond the 
memory time T, the pulse response becomes negligibly small. This 
fact permits us to develop a solution of the inversion problem 
which utilizes only equidistant values of £(z) along the imaginary 
axis. 

We start our discussion with the following imaginary experiment. 
Instead of giving the unit pulse only once, we will repeat it 
rhythmically in intervals of T, or in intervals larger than Tọ. Then 
the pulse response is also repeated rhythmically, but the successive 
responses do not overlap, since the time which separates them is 
enough to extinguish the response of the previous period. We now 
have a situation in which both f(t) and g (t) are periodic functions of 
the period 

T > To 


We will put 
w = 27r|To (4-29.1) 


The input function, if resolved into its harmonic components, can be 
written as follows (this series does not converge): 


21.1 2r 2yr 
SÆ =F] 5 +008 et Saar “| 


= al + cos Wot + cos 2at + | (4-29.2) 


The corresponding output g(t) becomes, if we remember the 
definition (18.4) of the frequency response 4(w) = L(iw): 
g(t) = K(t) 


o0 
ld 

OM > 
2m a 


L(ikw,)e*?o' + L(— hone (4-29.3) 


§ 29 Inversion by Fourier Series 291 


or, separating real and imaginary parts of L(iw), 


Lliw) = Alw) + iB(o) 
K(t) = 7 2s | 40) +> A(kw ) cos kwot 


o0 


-> B(kw,) sin ons | (4-29.4) 


k—1 


Here we have a resolution of K(t) into a Fourier series, based on 
equidistant values of the Laplace transform along the imaginary axis. 
If we introduce the new variable z, by putting 


z = iwz 


and rewrite the polynomials p(z) and q(2) in the new variable, we have 
again the simple task of substituting the integers 0, 1, 2, = in two 
polynomials and forming their ratio. The only inconvenience is that 
the coefficients of these polynomials are now alternatingly real and 
imaginary numbers. The result of the substitution will be a complex 
number, and in the end we have to divide one complex number by 
another. The resultant complex number gives in its real part the 
coefficients of the cosine series, in its imaginary part the negative 
coefficients of the sine series. 

Two difficulties still remain. The convergence of the resulting 
series may be too slow. Moreover, we may not know in advance the 
value of the memory time Tọ. However, these difficulties can be 
surmounted. The convergence of the series can be speeded up by 
modifying the function K(‘) properly. For this purpose we first 
expand L(z) around the point z = œ. We know in advance that for 
large values of z we will have 


2424A (4-29.5) 


Z 


This means that the terms of the Fourier expansion (4) decrease too 
slowly, namely with k-!. But let us correct £(z) by putting 


LO = £2) — < : F = D (4-29.6) 


292 Harmonic Analysis Chap. IV 


Then for large values of z the decrease occurs according to the law 
z~*, and the convergence of the series for K(t) becomes satisfactory. 
We can terminate the series at a properly chosen k by scanning the 
imaginary axis for the maximum of | L; (iœw) | and then neglecting the 
terms from a point w, on where the associated | £,(iw,) | has dropped 
to a few per cent of the maximum. We now have to decide about the 
choice of the frequency wọ, defined by (1). It is important only that 
we shall not overstep the upper limit 277/T), while smaller values are 
not harmful. If we adopt for œwọ the value 


1 
ty == o (4-29.7) 


we have a fairly reliable guarantee that we did not overestimate wọ. 
The division of w, into 12 parts means that we assume that 12 sine 
terms and 13 cosine terms will be sufficient to represent K(t) with an 
accuracy of a few per cent. We will seldom go wrong with this 
assumption, since a Fourier series of 25 terms has great flexibility in 
representing even rather rugged functions. Afterwards we can check 
up on the correctness of our hypothesis. The function obtained by the 
series is not the original K(t) but the following modification of it: 


K (À) = K() — pret — (pa + pte (4-29.8) 
This function starts from zero and has a horizontal tangent at zero. 
K,(0)=0, K,(0)=0 (4-29.9) 


These boundary conditions are responsible for the good convergence 
of the Fourier series. If we plot the course of K,(t), evaluated on the 
basis of the series (9), we can verify the adequacy of our choice (7) 
by finding that K,(¢) flattens out and becomes practically zero before 
the upper limit t = 27/m, is reached. 


30. Inversion by Laguerre functions. Our first two solutions of the 
inversion problem are applicable to any Laplace transform. These 
two solutions made use of equidistant values of £(z) along the real 
axis. Our third solution was specifically shaped to the demands of 
network analysis. We took advantage of the fact that the range of 
integration, although theoretically extending to infinity, was in 
actual fact reducible to a finite time interval T}, the “memory time” of 
the network. In the present section a method will be discussed which 


§ 30 Inversion by Laguerre Functions 293 


is again of general applicability. It is based on Fourier’s reciprocity 
theorem, which solves the problem of inverting the Fourier trans- 
form. The inversion of the Laplace transform is reducible to Fourier’s 
problem. 

Already in § 4 we have briefly touched on the relation between 
Laplace and Fourier transforms. If we consider the Laplace trans- 
form £(z) for purely imaginary values iw of z, we obtain the Fourier 
transform of the indicial function f(t). Now we can use Fourier’s 
reciprocity theorem (17.16) and obtain f(t) as the Fourier transform 


of Lliw). 
Lliw) = [7 (the—* dt (4-30.1) 
0 


[: [ee : l pme 
foO= ee f Liw)e dw = ai Í L£(z)e*' dz (4-30.2) 


Although f(t) is thus obtained in the form of a definite integral, 
the actual evaluation of this integral is by no means trivial, in view 
of the infinite limits of integration. We will develop a method which 
permits us to obtain numerically the Fourier transform of a function 
which is given in either analytical or numerical form. For this 
purpose we make use of a remarkable conformal mapping of the 
complex plane on itself, encountered earlier in the chapter on 
algebraic equations (cf. I, 18). This transformation mapped the 
entire right complex half plane into the inside of the unit circle. The 
unit circle itself became the image of the infinite imaginary axis. 
Hence an integration along the imaginary axis is changed into an 
integration along the unit circle. 

We transform the complex variable z of the Laplace transform 
into a new variable v, according to the following transformation: 


TEN S E 2 a 2 dv 
EET l+o’ = (1 + v} 
2 
= -EIT y (4-30.3) 


In the new variable the integral (30.2) is extended over the unit circle 


v= e” (4-30.4) 


294 Harmonic Analysis Chap. IV 


The limits of the angle variable 0 are [—7,+-7]. 

1 tie a is 1 +7 TEE 
fo= Tai | sn (z)e” dz = — ree A L£(2z)(1 + z)2e*+* d0 (4-30.5) 
Let us now introduce the function 


= (1 T z)? Pr-1 

We have seen (cf. 25.2) that the order of p(z) is generally one lower 
than the order of q(z). Modification of £(z) by subtracting the last 
term of (6) has the purpose that the difference between the orders of 
numerator and denominator—if the two fractions are brought to 
one denominator—shall be increased to two. Then the multiplica- 
tion by (1 + z)* balances the orders of numerator and denominator 
and £,(z) remains finite even at z = œ. In compliance with the 
modification of £(z) we will put 


JO = Prac’ +A (4-30.7) 


The function f{(t) now starts from zero at t = 0. 

We will now transform £;(z) to the new variable v. This can be 
accomplished by multiplying the coefficients of the numerator and 
similarly the coefficients of the denominator by the pretabulated 
matrix B,, (cf. Table II). This transformation is quickly accomplished, 
since the coefficients of this matrix are all integers. If we write L(V), 
we mean that this operation has been performed already on the 
original £,(z). (If it so happens that the numerator of £,(z) is of 
lower order than the denominator, we consider the numerator 
nevertheless a polynomial of mth order, replacing the missing 
coefficients by zeros.) 

Since £(e) is a periodic function of 0, it is natural that we shall 
expand it in a Fourier series. 


£(e") = (c,e"? + c_je~) = > (cw + c_,v-") (4-30.8) 
1 Zz k k 2, k k 


If the Fourier transform of an empirically given function is in 
question, the coefficients c,, c_, of this expansion can be obtained by 
trigonometric interpolation (cf. §§ 12, 13). Then the representation 


§ 30 Inversion by Laguerre Functions 295 


by a trigonometric series will not be exact, of course. It is enough, 
however, to know the maximum absolute error € of the expansion 
at any point of the unit circle. We can now estimate the maximum 
error of the Fourier transform (5), if f(t) is modified in the sense 
of (7). The integral (5) shows that £,(z) is multiplied along the 
path of integration by a factor which has the absolute value 1, no 
matter what the value of t is (t being real). Hence we can estimate 
that the maximum absolute error of f(t), calculated on the basis 
of (5), cannot surpass £. 


ln@)|<e (4-30.9) 


In the case of inverting the Laplace transform, much more can be 
said. Here we know that £,(v) is an analytical function of v which 
has no singularities inside or on the unit circle. We can expand 
£ (v) in a Taylor series around the point v = 0. 


L(v) = Cy + cv + cg? + (4-30.10) 


This expansion converges everywhere inside and on the unit circle. 
Hence we are here in the fortunate situation that we possess the 
exact Fourier coefficients of L (v) along the unit circle, without any 
integration. The comparison with the general expansion (8) shows 
that the c_, are not present at all. This corresponds to the fact 
that f(t) is identically zero for all values of ¢ which are less than 
zero. 

In network problems the coefficients c, can be obtained by a 
simple and elegant algorithm, called the “‘synthetic division of two 
polynomials,” encountered earlier in studying the roots of an alge- 
braic equation (cf. I, 14). The only difference is that in our earlier 
scheme that expansion proceeded in reciprocal powers of the variable, 
while now an expansion in direct powers of v is demanded. But this 
means only that the coefficients of the polynomials p(v) and q(v) are 
written in reversed order, starting with the absolute term and ending 
with the highest power. The absolute term qo of the denominator 
cannot vanish, since g(v) can have no root inside the unit circle and 
certainly no root at the center v = 0. Hence we can always divide 
both numerator and denominator by qo, thus making the absolute 
term of the denominator equal to 1. We put the successive coefficients 
of the numerator (in increasing order) on the “‘fixed strip,” and the 


296 Harmonic Analysis Chap. IV 


successive coefficients of the denominator—going from the bottom 
to the top and changing the sign of all coefficients beyond 1—on the 
“movable strip.” Then the regular movable strip technique (I, 8) 
generates on the nascent strip the successive coefficients c,. We may 
continue with the algorithm until 15 to 20 coefficients are obtained. 
Our problem is now reduced to inversion of the special Laplace 
transform 
k k kyk 
e a E a e (4-30.11) 
(+2? Upo Gaz 
We have put 
z- 1+2, (4-30.12) 


We will solve this problem in steps. In the first place, introducing 
z as a new variable, we obtain 


Lz) = {° fete dt = N g(t)e™! dt (4-30.13) 


where 
plt) = f(He (4-30.14) 
Now we know that the indicial function of the transform (2 + z)-1is 
SO = e” 


We also know that multiplication by t means negative differentiation 
in the transform plane. Hence the transform of the indicial function 


t k+1 e2 t 


(k +1)! 


becomes L(%) = CF za 


Similarly, multiplication by z, in the transform plane means differentia- 
tion in the original plane. Hence the indicial function of 


k+ 1)! 
La) Sa aaa (4-30.15) 
becomes 
gk 
(t) = (t*t+1e—2%) (4-30.16) 


at*+1 


§ 30 Inversion by Laguerre Functions 297 


An important class of orthogonal polynomials, called “‘Laguerre 
polynomials,” are defined by the following operation.’ 


L, = e Z (t"e-') (4-30.17) 
Hence 


L,(2t) = et (e et) (4-30.18) 


On the basis of (15){18) the following statements can be made: 


en2t 
C1 1 : = —_—__—— J 2t 
Indicial function g(t) KED r41(26) 
k+1 
Laplace transform: La) = r 
(2 + 2) 
Moreover, 
7 L,(2t) — Lyys(t) | 
l Be A a 
Indicial function: (t) = e | kl +D! 
z% att o 2z 


Laplace transform: £(z,) = ———— — = = 


Ory OHO Oey 


Going back to the original variable z, we have to multiply g(t) 
by e'. Hence we come to the conclusion that the inversion of the 
Laplace transform (11) becomes 


pe Lr) Leh) ; 
(—1)"e É oe (4-30.19) 


We will introduce the so-called “Laguerre functions” (cf. 5-20.16) 


e oe ) 


p(t) = (4-30.20) 


which form an ortho-normal function system in the range [0,00] of 
the variable t. The expression (19) can now be rewritten in the form 


(—1)*[p.(2t) — %x41(20)] 
1 Cf. {1}, p. 93. 


298 Harmonic Analysis Chap. IV 


and the inversion of the series (10) becomes 
Silt) = Copolt) — (Co + cdp) + (Cy + edp) — e 


=> (Mera + Dp) ae 
k=0 

We have seen before that termination of this series to a finite expan- 

sion involves an error which for no value of ¢ can surpass the error 

of the truncated series (10). 

The Laguerre functions ,(2t) can be pretabulated in reasonably 
small intervals of the variable ¢ and up to a reasonable order k (for 
example, k = 20).1 Then, by substituting for t a sufficiently dense set 
of values, we can plot the resulting expansion (21). We must not 
forget, however, to add the correction (7), in order to restore the 
complete pulse response of the network. This correction merely 
means that the coefficient of »(2t) = e~‘ is changed as follows: 


Co = Co + Prt (4-30.22) 


In retrospect we can say that we have obtained four convenient 
numerical schemes for construction of the pulse response of a net- 
work, given by its structural elements, without evaluating the roots 
of the polynomial g(z). While the first two methods utilized as 
key values equidistant values of £(z) along the real axis, the third 
method utilized equidistant values along the imaginary axis, and the 
present method made use of the infinitesimal neighborhood of the 
single point z = 1. The value of £(z) and all its derivatives at z = 1 
uniquely determine the coefficients c, of the expansion (11) and thus 
the indicial function f(t). 

In all these solutions one fundamental condition has to be fulfilled. 
The network must not contain a high-frequency vibration of 
small damping. Proper representation of such a vibration in our 
expansions would require an inordinate number of terms. However, 
a vibration of this kind corresponds to a root very near to the 
imaginary axis. Such roots can be spotted with no great difficulty 


1 Such a tabulation of the normalized Laguerre functions was prepared by 
Miss Fanny Gordon, staff member of the National Bureau of Standards, during 
the author’s stay with the Institute for Numerical Analysis, National Bureau of 
Standards, Los Angeles, California. It is reproduced as Table X of the Appendix, 
by permission of the Bureau. 


§ 31 Interpolation of the Laplace Transform 299 


(cf. I, 19) and their effect determined separately. We have to sub- 
tract the contribution of these roots (cf. 25.5) before we start our 
numerical scheme of inverting the Laplace transform. 


31. Interpolation of the Laplace transform. The Laplace transform, 
£(z) being an analytical function of the complex variable z, 
is uniquely determined if it is known on an arbitrarily small con- 
tinuous arc of the complex plane. Of greater interest, however, is 
the method of prescribing an analytical function in an infinity of 
equidistant points and obtaining the values everywhere else by 
interpolation and extrapolation. Such an interpolation formula for 
the Laplace transform can actually be developed. 

We have seen in § 27 that by giving the Laplace transform at the 
integer points z = 1, 2, 3, = along the real axis we can uniquely 
determine the “‘indicial function”? f(¢), which is transformed into 
£(z) by the transformation (27.4). Now let us substitute the infinite 
expansion (27.11) for f(&) and integrate term by term. For this 
purpose we need the definite integrals 

k 


1 a 
w,(z + 1) = | ert dé = > aaa (4-31.1) 


If these fractions are brought to one denominator, we obtain in the 
numerator a polynomial of the order k, while the denominator 
becomes the product 


(2+ Ie + 2e+ 3)" ekt VN 


The numerator can likewise be resolved into its root factors and we 
can see from the fact that any integer power z = 0, 1, 2,°--,k — 1 
is orthogonal to P7(&), that these root factors must be 


z(z — 1X(z —2)°-(@ —k+1) 
and what remains undetermined is a mere constant in front of these 
factors. But then, letting z go to infinity and remembering that 


P%(1) = 1, we find that the undecided numerical constant becomes 1. 
Introducing the symbol 


k | S eee ae (z!)? 
k] @+DG+2-@+K+ED) @+K+DIG—4H! 
(4-31.2) 


a=0 


300 Harmonic Analysis Chap. IV 


we obtain 
w,(z + 1) = | A | (4-31.3) 


and the interpolation formula for the Laplace transform becomes 


fet+N= >a f l = X Qk + DPO) H (4-31.4) 


k=0 k=0 


The symbolic notation P¥(y) (cf. § 27) has the following significance. 
We form the polynomial Pf (y) and replace y* by L(a + 1), i.e., the 
value of £(z) at the integer point z = « + 1. The coefficients of the 
interpolation formula (4) are thus determined by the values of £(z) 
at the points 1, 2, 3, ---. If the condition 


> | c, | = finite (4-31.5) 
k=0 


is fulfilled, the formula (4) will converge for all z points to the right 
from the imaginary axis. 

While this interpolation formula establishes the fact that the 
Laplace transform is uniquely determined if we know its value at all 
integer points, it would be a mistake to believe that these values can 
be prescribed at will. In the first place we know from the arbitrariness 
of the scale that the “unit distance” can be stretched out to any 
length. Hence we could thin out our originally given data to any 
degree we like. 

Furthermore, the transformation 


z=a+t2z, f(t) = fhe)” (4-31.6) 
shows that the points z = 1,2, 3, + correspond to the points z = «+1, 
x + 2,«-+ 3, . Hence not only can we thin out our data to any 


degree, but we can start them at a point which is arbitrarily far from 
the origin. Our functional data are thus always redundant to an 
infinite degree and we can never reduce them to a basic system which 
is both necessary and sufficient. This seems in peculiar contradiction 
to the fact that the c, can always be constructed to an arbitrarily 
given set of £(m) values; (m = 1, 2, 3, =). However, if these values 
are not properly given, the sum (5) will not converge, and our 
interpolation formula loses its significance. 


§ 31 Interpolation of the Laplace Transform 301 


Another interesting conclusion can be drawn from the interpola- 
tion formula (4). The Laplace transform associated with an arbitrary 
four-terminal network is always the ratio of two polynomials. This 
is only a very special class of Laplace transforms, characterized by 
those indicial functions which are defined as a finite linear combina- 
tion of exponential functions with exponents whose real part is 
negative. Yet the convergence of the expansion (4) means that an 
arbitrary Laplace transform can be generated with any degree of 
accuracy with the help of electric networks, in fact with the help of 
purely “‘ballistic’? networks, without any self-inductance L. Indeed, 
any absolutely integrable function of bounded variation can be 
approximated by a finite Legendre expansion of the form (27.11), 
with an error which can be made as small as we wish. The corre- 
sponding Laplace transform (4) can be written as the ratio of two 
polynomials, with the zeros of the denominator normalized to 
z = —1, —2, —3, + . This shows that not only can we simulate an 
arbitrarily given pulse response K(t) with the help of RC circuits, 
but we can normalize the constants of these circuits in such a way 
that the products RC shall be in the ratio 1 : $:4:}:- to one 
another. Particularly instructive is the application of the interpola- 
tion formula (4)—truncating the number of terms to n—to the 
function 

1 


(2+ PF +7? 


thus demonstrating explicitly that the output of a single RCL 
circuit is replaceable by the output of a sufficient number of properly 
coupled RC circuits, although the output of every one of these 
circuits is a monotonously decreasing function, without any 
vibrations. 

We now come to the second inversion method, studied in § 28. 
Here the indicial function f() was expanded in a Fourier sine 
series; the coefficients of this series were obtained with the help of 
the same £(m + 1) as before, but weighted by a different set of 
weight factors (the coefficients of the Chebyshev instead of Legendre 
polynomials). The integrals which take the place of (1) are now 


1 n 0 
w(z + 1) = Í é? sin kô dé = i cos? 5 sin k0 sin 9 d0 (4-31.7) 


302 Harmonic Analysis Chap. IV 


This integral is expressible in terms of gamma functions and we 
obtain 


(z + 3)!2! 


w,{z + 1) = kV m CGFk+De— k4! (4-31.8) 
Once more we introduce a simplified notation. 
ee ee) 
Gleen 1)! (4-31.9) 


and obtain the a formula, 


L = —= b, ; 
(z+ 1) Fad k Ae, T S kU: wi, | aso 


The poles of the functions (9) are at z = —Ẹ, —ž, —4, --- but these 
functions can no longer be written as the ratio of two polynomials, 
and are thus not realizable with the help of electric circuits. 

Still another interpolation formula is derivable if the third method 
of inverting the Laplace transform is used (cf. § 29). This method is 
restricted to the case of a finite integration time Tọ, as encountered in 
network problems. The points of interpolation are here equidistant 
points along the imaginary axis. The method of deriving the inter- 
polation formula is the same as that before; we introduce the 
expansion obtained for f(t) in the definition of the Laplace transform 
and integrate term by term. We thus obtain 


L(2) = eT! > Lik o) = a 


4-31.11 
Lie nik (4-31.11) 


The basic interpolating functions are here free of any poles and 
cannot be interpreted in network terms. 

Finally in the fourth solution of the inversion problem we have 
expanded £(z) around the point z = 1, after transforming z into a 
new variable v. However, the resulting expansion can be formulated 
in the original variable z and written as follows: 


Aa =2 > Ll au) 2H (: - e) A - - (4-31.12) 


where the notation L,(—2y) is used operationally in the sense that in 
expanding the Laguerre polynomial L,(—2y), we should replace y” 


§ 31 Interpolation of the Laplace Transform 303 


by £(1). This expansion converges for all complex z points whose 
real part is positive. 

If this interpolation formula is interpreted in network terms, we 
see that now an arbitrary Laplace transform is generated with the 
help of ballistic circuits whose products RC are crowded around the 
constant value 1. The interpolating functions have a pole at the 
point = —1, but this pole is of ever-increasing order k. Now a pole 
of the multiplicity k at z = —1 is equivalent to a superposition of k 
RC circuits of properly chosen strengths, and with products RC which 
differ from 1 by the arbitrarily small values £4, £2, °° , E% 

While in this interpolation (or rather extrapolation) formula the 
key values are taken from the infinitesimal neighborhood of L(2) 
around the point z = 1, we can obtain a still different interpolation 
formula by using as key values the entire set of [(iw) values along 
the imaginary axis. For this purpose we consider that the entire 
right half plane of the complex variable z is mapped into the inside 
of the unit circle of the variable v. In this new frame of reference we 
can make use of Cauchy’s integral theorem: 

Lv) = l L(vo) dvo 


4-31. 
2miJ vV — v 13) 


where vy moves along the unit circle. Going back to the original 
variable z we obtain 


al +o L£iw) dw 
B= 5049 E E 


Since an integral is the limit of a sum, we observe that an arbitrary 
indicial function—and the Laplace transform generated by it—can 
also be simulated in terms of pure LC circuits, without any R, i.e., 
with the help of pure sinusoidal vibrations, without any damping. 
We have here the extreme counterpart of the ballistic circuits of the 
interpolation formula (4), as if the key values, distributed along the 
real axis, had swung around by 90°. Yet there is a fundamental 
difference between the two cases. The strength of the simulating 
circuits is in the imaginary case continuous and without any extremes, 
because of the orthogonality of the basic functions. In the real case, 
however, the strength of the simulating ballistic circuits has to be 
chosen in an extreme fashion, because of the strongly nonorthogonal 
character of the basic interpolating functions. 


(4-31.14) 


304 Harmonic Analysis Chap. IV 


Bibliographical References 


[1] Busy, V., Operational Circuit Analysis (Wiley, New York, 
1929). 


[2] Carstaw, H. S., Theory of Fourier Series and Integrals 
(Macmillan, London, 1921). 


[3] CHURCHILL, R. V., Fourier Series and Boundary Value Problems 
(McGraw-Hill, New York, 1941). 


[4] FRANKLIN, PH., Fourier Methods (McGraw-Hill, New. York, 
1949). 


[5] GUILLEMIN, E. A., Mathematics of Circuit Analysis (Wiley, 
New York, 1949). 


[6] Jackson, D., Fourier Series and Orthogonal Polynomials 
(Math. Assoc. of America, Oberlin, 1941). 


[7] THomson, W. T., Laplace Transformation (Prentice-Hall, 
New York, 1950). 


[8] Wupper, D. V., The Laplace Transform (Princeton University 
Press, Princeton, 1941). 


[9] Wiener, N., The Fourier Integral and Its Applications (Cam- 
bridge University Press, New York, 1933). 


Article 


[10] DANIELSON, G. C., and Lanczos, C., “Improvements in 
Practical Fourier Analysis and Their Application to X-ray 
Scattering from Liquids,” J. Franklin Inst., 233, 365, 435 (1942). 


DATA ANALYSIS 


1. Historical introduction. The limited accuracy of physical 
observations has been recognized since ancient days. Archimedes 
estimated the minimum size of the universe on the basis of the 
absence of any observed astronomical aberration within the error 
limits of the ancient instruments, assuming that Aristarchus’ helio- 
centric theory is accepted as correct. Hand in hand with the ever- 
increasing accuracy of measuring instruments, a definite mathematical 
problem took shape. Let us assume that we have occasion to 
observe the same phenomenon repeatedly, under practically identical 
conditions. “Practically identical’ means that the decisive factors 
which are responsible for a certain physical event remain the same, 
while uncontrollable side events change in a random fashion. For 
example, we may measure repeatedly the relation between displace- 
ment and time of the motion of a ball, falling freely toward the earth. 
The mass of the ball and the force of gravity, which determine 
the event, remain unchanged, while slight imperfections of the 
length- and time-measuring instruments will vary erratically. 
What these ‘erratic’? or “random” variations mean in each case 
is often difficult to establish. But the demand arose for a general 
mathematical procedure by which surplus measurements could be 
handled, from the viewpoint of taking the best advantage of all 
measurements. 

In the beginning of the nineteenth century C. F. Gauss (in 1809) 
and A. M. Legendre (in 1806) discovered independently a remarkable 
universal method by which surplus measurements could be adjusted. 
This method, known as the “method of least squares,” gave an 
answer to the problem of surplus data, based purely on mathematical 
logic rather than on inductive physical reasoning. Later the method 

305 


306 Data Analysis Chap. V 


was further developed and became the cornerstone of all statistical 
considerations. The great success with which this method could be 
applied to an innumerable variety of physical and engineering 
problems leads to the conclusion that the method of least squares, 
originally discovered by sheer mathematical intuition, is neverthe- 
less in fundamental harmony with the structure of the physical 
universe. 

The problem of analyzing data has still another important aspect. 
Measurements always occur in a discrete set of points, while the 
functions we postulate as the basis of our measurements exist for a 
continuous range of the variable. Essentially we always tabulate 
functions, whether that tabulation is the result of mere calculations 
or of physical observations. This brings up the question of defining 
the unknown function between the points of tabulation. This is the 
problem of “‘interpolation’’ which fascinated mathematical research 
from the earliest times. The great analysts of the seventeenth 
century were well acquainted with the methods of equidistant inter- 
polation and developed the theory of finite differences to a remarkable 
degree. They used already ordinary and also central differences. 
The pioneering work of James Gregory (1638-1675) was further 
advanced by Isaac Newton (1642-1727); J. Stirling (1692-1770) and 
later F. W. Bessel (1784-1846) added further formulas. The 
rigorous treatment of interpolation theory starts with D. Hahn 
and L. Fejér (1918). The dangers of equidistant polynomial inter- 
polation were discovered independently by C. Runge (1901) and 
E. Borel (1903). 


2. Interpolation by simple differences. One of the most funda- 
mental formulas of analysis is the Taylor expansion which represents 
an analytical function f(x) in terms of the derivatives of f(z), all 
taken at the point x = 0. 


x 


2 n 
SOQ) ={O+SFO+ LO s+ +O T+ G21) 


This formula can be extended to finite differences instead of deriva- 
tives. Let f(x) be given at the equidistant points 


x = 0, h, 2h, 3h; , nh, > (5-2.2) 


§ 2 Interpolation by Simple Differences 307 


and let us form the successive ‘“‘difference coefficients” 


Af) _ fM- 


Ax h 
LO) JOS SOTO) (5-2.3) 


Ax? h? 


These quantities take the place of the successive derivatives f (0). 


The functions 
ok 


P(x) = kl (5-2.4) 
are characterized by the functional equation 
P(X) = Prl) (5-2.5) 


They have to be replaced by a class of functions which shall satisfy 
the equation 


Ag, eae 
ane = PETOT ae = Py_-1(@) (5-2.6) 


the solution of which iS 
a(x — h(x — 2h) = (x — kh + h 


Our equations gain greatly in simplicity if we choose a scale for the 
independent variable x which makes h= 1. The kth difference 
coefficient can now be replaced by the kth difference itself and the 
auxiliary functions p,(x) become 


pla) = e ener) (5-2.8) 
We thus obtain the Gregory-Newton interpolation formula 
—] 
fe) =fO + Ort aAyO=—2 5.29 


a(x — 1) (a — k + 1) 
+ + Af O) e ee eg 


308 Data Analysis Chap. V 


The Taylor series has the character of an “‘extrapolating”’ (predicting) 
series, since the value of f(x) and all its derivatives at the origin x = 0 
serve to evaluate f(x) outside the origin. The series (9), on the other 
hand, can serve for evaluation of f (x) between the data points. Hence 
it is of an interpolating type. 

The numerical application of formula (9) is greatly facilitated by 
setting up a “difference table,” according to the following pattern. 


x y Ay Ay Ay Ay 


010 11 —5 3 —4 
iO, 622) ai 

2 27 4 —3 

3 31 1 (5-2.10) 
4 32 


If, e.g., the value of y = f(x) is requested at the point « = 0.4, 
application of (9) gives 
5 (0.4)(—0.6) 3 (0.4)(—0.6)(— 1.6) 


y(0.4) = 10 + 11-04 — z 


+- 
(0.4)(—0.6)(—1.6)(—2.6) 
24 


The “‘zero line” of formula (9) is flexible and can be shifted to any 
integer value of x, since the values of x are not more than an ordering 
scheme which may start at any point. If, for example, (2.1) is 
demanded, we would naturally interpolate from the point x = 2 on, 
instead of starting from x = 0. 


fQ.1) =27+4-01— 


=< i = 15.36 


ge 2975 
2 
We thus gain in convergence, although we may lose in the number of 
available terms if we are too far down in the scheme. This deficiency 
can be remedied, however, by filling in the holes downward. The last 
column remains —4, and thus the next unoccupied diagonal from the 
right to the left downward becomes —4, —5, —8, —7, 25. Hence at 
the point 2.1 we could have added the term 


—5+ (0.1 —0.9(—1.9) _ 
eS 


—0.14 


§ 3 Interpolation by Central Differences 309 


3. Interpolation by central differences. Simple differences are 
advisable only at the beginning or the end of a table of functional 
values. (“Beginning”’ and “‘end” are relative terms and become inter- 
changeable by reverse ordering of the x values. Hence the “end” of a 
table is covered by our previous discussion.) In the middle of the 
table, however, we can take good advantage of the fact that every 
functional value y(k) has a neighbor to the right and to the left. Only 
at the two ends of the table does it happen that we run out of 
neighbors, viz., neighbors to the left at the beginning and neighbors 
to the right at the end of the table. As soon as we have somewhat 
advanced in our table, we can set up a “central difference table” 
which tapers off toward the middle from both ends in the shape of 
two triangles, one being the reflection of the other. 

The principle of a central difference table is once more that we take 
the difference of two consecutive values of the same column. Only the 
arrangement of these values is different. The result is not written 
parallel to the upper of the two values but halfway between the two 
values involved. For this reason we distinguish between “‘full lines” 
and “half lines”. The original data are written on the “‘full lines”. 
The arrangement is shown in the following scheme. 


x | y {| w {| dy | d%y | dty 
0 | 10 
11 
1 | 21 —§ 
6 3 
2 2 —2 —4 
4 —1 (5-3.1) 
3 | 31 —3 —4 
1 _§ 
4 32 — $ —3 
= —§ 
5 | 25 —16 
95 
6 2 


The table in its present form is full of gaps. On both the full lines 
and the half lines only every second term is present. We are now going 
to fill out these gaps by the following simple device which holds 
universally for all columns: Take the arithmetic mean of the number 


310 Data Analysis Chap. V 


above and below the gap. Hence the completed central difference 
scheme will finally look as follows. 


x | y | æ | dy | dy | ôy 


0 10 
0.5 15.5 11 

1 21 8.5 —5 

LA 24 6 —3.5 3 

2 27 5 —2 1 —4 

2.5 29 4 —2.5 —1 —4 (5-3.2) 
3 31 25 —3 —3 —4 

3.5 31.5 1 —5.5 —5 —3.5 

4 32 —3 —8 —6.5 —3 

4.5 28.5 —7 —12 —8 

5 25 —15 —16 

5.5 13.5 —23 

6 2 


The great advantage of this central difference scheme lies in its 
increased convergence. For example in the line 3 we find the second 
and third differences as —3, —3 while in the original arrangement 
these values would be —8, —8. Moreover, since we possess the half 
lines in addition to the full lines, we can interpolate from both 
kinds of lines, and thus the x of the interpolation formula need not 
go beyond +0.25, compared with the +0.5 of the previous table. 
This, too, is of great advantage by increasing the accuracy with the 
same number of differences. An interpolation with the help of three 
successive differences will almost always be sufficient in a reasonably 
spaced set of data, if central differences are used. 

The fundamental 9,(x) functions associated with a central 
difference table are defined by the functional equation 


P(e + 1) + pæ — 1) — 29,2) = $y_2(2) (5-3.3) 
starting with p) = I, p(x) = x 
We have to differentiate between even and odd orders, according to 
the following formula. 
r? (x? LaS 1)(x? TE 4) eee (x? vais (k = 1)*) 
O |" ements 


x(x? — 1)(x2 — 4) «+ (x? — (k — 1)*) 


Por) = Ok — 1 (5-3.4) 


§ 3 Interpolation by Central Differences 311 


The functions for the half lines were found by Bessel. We thus 
obtain the interpolation formulas of Stirling and Bessel. Stirling’s 
formula, using ahy full line as base line is 


2 2.4 
ue) = Yo + Ove + Hes + OY“ — 6.3.5 
x(x? — 1) x(a? — 1)(2? — 4) 
4 5 i eos 
+ (yo z4 + (ôy)o 120 
Bessel’s formula, using any half line as base line is 
) 2 — 
yE) = m + Oor + (Ov) —# + (6%, == =P 5.3.6 
(2? — Pe — 3) s(a? — He? — 5 
4 WSs ie Dag Se te 5 ee ey eee 
+ (ôy) 74 + (6°y)o 120 F 


As an example for the use of these formulas let us apply them to 
the table (2), evaluating y(3.18). We are near to the full line x = 3 
and thus we will use Stirling’s formula. We find on line 3 the value 
31, 2.5, —3, —3 and thus obtain 


(0.18)? (0.18)(0.18? — 1) 
3 —— — 3 —— 
2 6 
On the other hand, let us obtain (3.41). Here we are near to the 
half line 3.5 and we can use Bessel’s formula, with x = —0.09. The 
values found on line 3.5 are 31.5, 1, —5.5, —5, and we obtain 
(0.09)? — 0.25 
acheter aoa 
2 
(0.09)? — 0.25 
6 


y(3.18) = 31 + 2.5-0.18 — — 31.49 


y(3.41) = 31.5 — 1- 0.09 — 5. 


+3-0.09 = 32.06 


We will finally discuss the question of interpolation in the case of 
nonequidistant arguments. The operation with “divided differences” 
is very cumbersome. However, their use can frequently be avoided 
on the basis of the following consideration. We introduce an auxiliary 
variable ¢ by some functional relation 


x= u(t) (5-3.7) 


312 Data Analysis Chap. V 


Then an equidistant tabulation in t corresponds to a nonequidistant 
tabulation in x. If the changeable scale of tabulation is sufficiently 
smooth, we can assume that we will interpolate in the variable t 
rather than in x. But then the previous interpolation formulas 
remain in force, in spite of the variable scale. The only change is that 
the x of the previous formulas has to be replaced by 


£ — Xo 
bal (5-3.8) 


where x) and x, are the two points between which we interpolate. 
This consideration shows that the assumption of a strictly equidistant 
tabulation of f (x) is not very critical to the application of the Stirling 
or the Bessel formula of interpolation as long as the changing scale 
follows a smooth mathematical law. 


4. Differentiation of a tabulated function. The operation d/dx 
is not directly applicable to a function which is merely given in 
discrete equidistant data points. After the interpolation, however, 
we possess f (x) everywhere and can now perform the differentiation. 
We are usually interested in the slope of the curve at the same points 
in which the observations were made. Hence we can differentiate 
our interpolation formulas at x = 0 and thus obtain a relation 
between the d/dx and the A/Az operations. 

Simple differences. In the case of simple differences we use the 
Gregory-Newton formula (2.8) and obtain 


A*f(0)  A8f(0 
ro = apy — “ZO, ALO an 
which gives the following operational relation: 
D = A — A?*/2 + A3/3 — = = log (1 + A) (5-4.2) 


The convergence of this formula is very slow. 


1 For the general polynomial interpolation at arbitrary points by means of the 
Lagrangian interpolation formula cf. VI, 10. For the ingenious method of Aitken 
(and its later modification by Neville), which reduces any polynomial interpola- 
tion to a succession of linear interpolations, cf. {4}, p. 84; [4], p. 76. 


§ 5 The Difficulties of a Difference Table 313 


Central differences. Much better convergence is obtained in the 
case of central differences. The differentiation of Stirling’s formula 
gives the relation 

D = 6 — &/6 + 69/30 — = (5-4.3) 
The convergence compared with the simple difference formula (2) 
is greatly improved. Of great advantage also is the fact that only 
the differences of odd order enter the formula. 

Even stronger convergence is obtained if we differentiate Bessel’s 
formula (3.6). Here we get 

ay, p — ø ; 
=ð >o + z > 7 0° — (5-4.4) 
The factor of 6° is 4 times, the ra of 6° is 7 times smaller than on 
the full lines. We thus see that the slope of a curve can be ascertained 
with particularly great accuracy halfway between the points of 
observation. 


5. The difficulties of a difference table. While the calculus of 
finite differences is an eminently important tool of applied analysis, 
we can employ it only with great caution if observed functions are 
involved. The accuracy of a calculated mathematical table is usually 
sufficiently good to set up a difference table. If, however, y = f (x) 
is merely observed in equidistant intervals, the higher differences have 
the unpleasant quality of greatly magnifying the errors of the observa- 
tions. Let us assume that in a sequence of observations our data are 
correct except for one observation where an error of | unit appears. 
Let us see how this isolated error will propagate through a central 
difference scheme. 


0 
0 

0 1 
0 1 

0 1 —4 
1 —3 

1 =2 6 
—1 3 

0 1 —4 
0 —1 

0 1 
0 


314 Data Analysis Chap. V 


We see from this table that a single peak spreads out more and 
more to a mountain of increasing base and increasing height. The 
increase occurs according to the law of the binomial coefficients. 
While this increase does not seem too rapid, it is actually sufficient 
to destroy the usefulness of a difference table in many instances. Let 
us keep in mind that the operation with higher differences presupposes 
a certain smoothness of the function. This means that the higher 
differences decrease rapidly and are quickly negligibly small. A 
function tabulated in small intervals—such as a customary logarithm 
table for example—has seldom differences higher than second order 
which are still significant. Here the interpolation is reduced to one 
or two terms. On the other hand, the use of higher differences can 
be very useful by making possible a tabulation which proceeds in 
much larger intervals. But then it is necessary that the functional 
values shall be given with very great accuracy. A closely spaced table 
of a few decimal places is thus replaceable by a widely spaced table, 
provided that the key values are given with excessive accuracy in 
order to make a differencing up to sufficiently high order possible. 
Such conditions, however, cannot be matched if the key values of the 
table are not mathematically calculated, but physically observed 
quantities. The interval in which the observations proceed is usually 
small enough to make an interpolation by two or three central 
differences sufficiently accurate. But the difficulty is that the magni- 
fication of the errors in the higher differences completely masks the 
true values of the higher differences because of two opposing 
trends: the true differences strongly decreasing, the errors strongly 
increasing. The result is that the difference table of an empirically 
observed function behaves quite differently from that of a mathe- 
matically evaluated function. The first and second differences are not 
too much modified, but the relative errors are already large. The 
higher differences show a completely erratic behavior and cannot be 
used for computational purposes. 

Under these circumstances we have to discard the methods of 
higher differences if empirical and not mathematical functions are 
involved. We have to invent other methods for effective interpolation 
and differentiation of empirical functions, making use of the 
principles of “least squares.” One particular feature of a difference 
table, however, deserves honorable mention: if there are glaring single 
errors among our observations which completely disrupt the smooth 


§6 Fundamental Principle of Method of Least Squares 315 


flow of a function, such errors stand out like illuminated spots 
against a generally dark background, if the test of a central difference 
table is applied. The fourth or fifth difference will now show some 
exceptionally high maxima, surrounded by terms of alternating sign. 
This locates the “bad” spots in our measurements and we should 
first adjust the data in a way that these anomalous peaks shall 
disappear. We can do that by dividing the peak value by the binomial 


coefficient 
ane 2k 
Si 


if the peak appeared in the 2kth difference, and subtracting this 
quantity from the y entry of the line in which the peak appeared. 


6. The fundamental principle of the method of least squares. Let us 
assume that two variables are connected by a mathematical law whose 
form is known by hypothesis, although some of the constants of the 
law are unknown. If we have as many measurements as the number 
of unknown parameters demands, determination of these parameters 
is a purely algebraic problem. If, however, we have made more 
observations than necessary, we have a mathematical situation 
which we call “‘overdetermined”’ since the number of equations is 
greater than the number of unknowns. If the measurements were 
free of errors, we could simply discard the surplus equations, since 
they do not contribute anything new to the previous statements. 
Since, however, the measurements are not accurate, each new 
observation adds some new information of its own. The equations 
as they stand are not solvable and have to be “adjusted.” We do 
that by taking the difference between the theoretical value and the 
observation, calling this quantity the “‘residual.” While it is in 
general impossible to find values of the unknown parameters which 
would make each of the residuals equal to zero, we can always find 
values of these parameters which will make the sum of the squares of 
the residuals a minimum. The magnitude of this minimum gives us 
a measure of the closeness of our observations. Assuming that the 
postulated mathematical law is correct, the minimum “‘zero” would 
mean that each residual is zero and therefore all our observations 
are perfect. The larger the minimum, the more are our observations 
at error. The square root of the sum of the squares of the minimized 
residuals, divided by their number, can be considered the “average 


316 Data Analysis Chap. V 


error” of our measurements, although it is sometimes advisable to 
examine the distribution of the residuals and see whether some 
residuals are not conspicuous by their magnitude. If we find residuals 
which are more than 3 times the average error, we will prefer to 
discard these measurements altogether and minimize the sum of the 
squares of the remaining residuals. 

In many problems of physics and engineering the unknown 
parameters enter the given mathematical law in a linear fashion. The 
problem of minimizing the sum of the squares of the residuals is then 
equivalent to the minimizing of an algebraic expression of second 
order. The resultant equations are linear, with a symmetric and 
nonnegative (in most well-formulated problems even positive 
definite) matrix. These equations are called the “normal equations” 
of the given problem. 

The same principle is applicable in all expansion problems which 
involve orthogonal function systems. The problem is here to expand 
a given function into a linear sequence of prescribed functions. If 
this sequence is finite, we cannot obtain a perfect answer. Yet we can 
always obtain a “best” answer by taking the difference between the 
given function and its expansion and considering this difference as 
the “‘residual”’ of our approximation. Since we are not interested in 
approximating the function in one particular point only, but every- 
where within a given continuous domain, the method of least squares 
now requires that we integrate the square of the residual over the 
given domain. The resultant normal equations then give the “‘best”’ 
approximation of f(x) in that domain, in terms of the prescribed 
functions. This procedure is the basis of the theory of orthogonal 
function systems and is of fundamental importance in almost all 
branches of applied analysis. 


7. Smoothing of data by fourth differences. In a sequence of 
observations which scatter on account of accidental errors, the 
question of interpolation is quite different from the corresponding 
problem of a mathematically tabulated function. The data points are 
usually so close together that mere linear interpolation would already 
suffice. The difficulty is, however, that the basic values themselves 
are inaccurate and would give a very “bumpy” curve if we simply 
join them by straight lines. We have to “smooth” our data in order 
to draw further conclusions from them. This means that by some 


§7 Smoothing of Data by Fourth Differences 317 


statistical considerations we try to reduce the influence of the random 
errors. One simple method, which is the analytical counterpart of 
the French curve type of smoothing, is based on the use of fourth 
differences. 

We argue as follows. We assume that our data are sufficiently 
close together to justify the hypothesis that the second derivative 
does not change essentially during the course of a few measurements. 
In particular, we want to combine every measurement with its two 
neighbors to the left and to the right. This gives us altogether five 
consecutive observations, and we assume that these observations 
would lie very nearly on a parabola of second order, were it not for 
experimental errors. The theoretical course for these measurements 
is then given by the law 

y =a + bx + cz? (5-7.1) 
where the coefficients a, b, c have to be adjusted to our data. However, 
these 3 parameters cannot be adjusted to 5 data, and thus we use the 
principle of least squares. We form the difference 


> Y-w? (5-7.2) 


and minimize this quantity with respect to a, b, c. In our case the 
data belong to the points x = —2, —1, 0, 1, 2, and thus the function 
to be minimized becomes 
(a — 2b + 4c — y_,)* 

+(a—b +e —y4) (5-7.3) 

+ (a — Y ) 

+a+b +e —y%, X 

+ (a + 2b + 4c — y: )} 
The condition of minimum with respect to a and c gives the two 
“normal equations” 


+2 
5a + 10c — > y, =0 (5-7.4) 


k=—2 


= 
10a + 34c — >» k?y, = 0 
k=—2 
The equation for b is of no interest to us at the present moment. 
Our goal is to correct the center value yọ, which belongs to x = 0. 
But then equation (1) shows that the theoretical value at x = 0 is 


318 Data Analysis Chap. V 


y =a. Hence we need only a. The solution of (7.4) for a yields 


10a = —6y_2 + 24y_1 + 34yo + 24y, — bye (5-7.5) 
= 70yo — 6(Y_2 — 4y_1 + 6Yo — 44, + Ya) 
= 70yo — 6 dy 
where 6*y, denotes the fourth central difference of the zero line. We 
thus find a = Ya — 35 Sy, (5-7.6) 


and obtain the following simple method by which the “random 
scatter” of a sufficiently close set of data frequently can be effectively 
reduced. Construct a central difference table up to the order 4. 
Subtract from each ordinate 3% of the fourth difference associated 
with that particular line. Since 3’; is very near to 7's, we can say that 
the correction is practically = —7j;6*y). Every single datum may be 
corrected by this amount. The new data will fit a much smoother 
curve. We can demonstrate that by constructing once more a difference 
table and showing that the fourth differences are now considerably 
reduced and less irregular than they were before. It is frequently 
advisable to give the corrected data to one more decimal place than 
the original data. This will assure a smoother fit, If the fourth 
differences still show a rather irregular pattern the method may be 
repeated a second time. 


The following numerical example shows the application of this 
method by applying it to the take-off performance measurements of 
an airplane. The observations were made from second to second, 

1 If the observations are so close together that 7 points, viz., 3 neighbors on 


both sides plus the central point, can be joined by a least-square parabola of 
second order, the correction formula becomes 


9644) + 2d%y, 
21 


In both cases the operation with a central difference table can be avoided and 
replaced by a “movable strip technique” of the type (8.5). The code numbers of 
the movable strip are in the case of two neighbors, 


Yo = Yo — 


—3, 12, 15, 12, —3 
35 
and in the case of three neighbors, Í 
=2,. 36 Ty. 6; 3, = 
21 
The arrow indicates the “zero line” opposite to which the result of the weighting 
is written. 


§7 Smoothing of Data by Fourth Differences 319 


but in the table only every second measurement is recorded. This 
was done because the observations were divided into two independent 
groups, those belonging to the even and those to the odd seconds. 
Each group was independently adjusted by the same method, 
comparing the resulting graphs. This provides a valuable check on 
the results by obtaining an estimation of how much fluctuation is 
caused by the scattering of the data. The following table contains 
the calculation for one single group only. The other group was 
handled in identical fashion and gave consistent results. 


t | y | æ | dy | dy | dty |—sdty| g 
0 0 0.8 
25.0 
2 25 17.4 20.7 
42.4 44.8 
4 67.4 62.2 —95.5 8.2 75.6 
104.6 — 50.7 
6 172 11.5 22.1 —1.9 170.1 
116.1 — 28.6 
8 288.1 —17.1 134.4 | —11.5 276.6 
99.0 105.8 
10 387.1 88.7 —201.2 17.2 404.3 
| 187.7 —95.4 
12 574.8 —6.7 127.2 | —10.9 563.9 
181.0 31.8 
14 755.8 25.1 —37.1 3.2 | 759.0 
206.1 —5.3 
16 961.9 19.8 17.4 | —1.5 | 960.4 
225.9 12.1 
18 1187.8 31.9 —28.1 2.4 1190.2 
257.8 —16.0 
20 | 1445.6 15.9 46.5 | —4.0 | 1441.6 
273.7 30.5 
22 1719.3 46.4 — 59.8 5.1 1724.4 
320.1 — 29.3 
24 2039.4 17.1 7.5 —0.6 | 2038.8 
337.2 —21.8 
26 2376.6 —4.7 31.2 —2.7 | 2373.9 
332.5 9.4 
28 2709.1 4.7 —14,2 1.2 2710.3 
337.2 —4.8 
30 3046.3 —0.1 3046.4 
337.1 


32 3383.4 3383.3 


320 Data Analysis Chap. V 


The first two and the last two data are not covered by the general 
procedure, since here we do not possess neighbors on both sides. 
We are thus forced to use neighbors on one side only and we will lay 
a least-square parabola of second order a + bx + cx? through five 
consecutive points. This still amounts to the same procedure as 
before, since we can agree that the value x = 0 shall belong to the 
third point, designating the first two points by x = —2 and z = —I. 
However, we now need the theoretical value (1) also for these two 
points, adopting 


J_a =a —2b+ 4c and J ı=a—b+c 


as the corrected values of the first two observations. The calculations 
give the following result: 


Ja = Y-a + 36? + 3564 
Jı = Yı — 36 — 764 


with the understanding that for ô and 64 we use the nearest available 
central differences, found along the upper slanted line. For example, 
the correction of ¥(0), i.e., y_., becomes in our case 


g0) = 0 + 3(44.8) + s5(—95.5) = 0.8 
while the correction of y(2), i.e., y_;, becomes 
(2) = 25 — $(44.8) — 7(—95.5) = 20.7 
The corresponding formulas for the other end of the table become 
Jo = Yz — 30 + 3504 
t= y + 88 — 764 


using the nearest differences of the lower slanted line. This means 
for our example 


(32) = 3383.4 — 3(—4.8) + ¥s(—14.2) = 3383.2 
9(30) = 3046.3 + #(—4.8) — 3(—14.2) = 3046.4 


§ 8 Differentiation of an Empirical Function 321 


8. Differentiation of an empirical function. The definition of the 
derivative as the limit of a difference coefficient is of little value if 
our observations are not free of errors. The ratio Ay/Az gets 
excessively sensitive to even small errors if Ax becomes very small. 
The higher differences are even less reliable. Hence we cannot use 
the same methods for differentiation of equidistant data which were 
employed for mathematically accurate data. Once more we have 
to resort to least-square methods for solving the problem of differ- 
entiation. 

The hypothesis we have made in the previous section, namely, 
that the acceleration changes little during five consecutive obser- 
vations, will be valid in many situations. We can then lay a 
parabola of second order through these points, but this para- 
bola has to be obtained by least-square minimization, since 5 
points will generally not lie on a polynomial of second order. The 
procedure is exactly the same as the one followed in § 7. We 
combine every point with its two neighbors to the left and to the 
right. We again minimize the sum (7.3) and again obtain the 
normal equations (7.4). The only difference is that in our previous 
problem we wanted to obtain the corrected value of the function 
at the point x = 0, while now it is the corrected value of the 
derivative that we want to obtain at the same point. Hence it is 
now the constant b in which we are interested. The condition 
of minimum gives 


10b = —2y-2 — Y- + Yı + y2 (5-8.1) 


or 


p — D-a Yat Ht W (5-8.2) 
10 

If the interval between two observations is not 1 but h, then b has to 
be divided by h in order to get the derivative. Hence the final 
formula for the differentiation of an empirically given function with 
the help of five neighboring ordinates becomes 


S'E) _ —2f (x — 2h) — f(a — h) + f + A) + 2f(e + 2h) 


Tr (5-8.3) 


322 Data Analysis Chap. V 


If not 2 but k neighbors are used on both sides, the formula becomes 


+k 


> af (x + ah) 
f'E = 5 (5-8.4) 


2 > oh 
a=] 


We apply this formula numerically by making use of the “movable 
strip” technique. We put on the movable strip the numbers —2, 
—1, 0, 1, 2, and obtain the smoothed derivative by performing 
the indicated multiplications and then moving the strip down- 
ward. The division by 10 merely changes the position of the 
decimal point. 

The following data were obtained as the result of take-off measure- 
ments, giving the horizontal position of the airplane from second 
to second. We want to know the velocity of the airplane from 
second to second. The table indicates the method applied and 
the results of the calculation for a few data. The first two places 
are left blank since here we do not possess the neighbors on both 
sides. 


= 0 
—] 4 
0 25 18.08 
1 50 28.42 
2 67.4 | 36.89 (5-8.5) 
124.9 | 40.74 
172.0 | 51.79 
201.4 | 50.89 
288.1 | 55.01 
321.3 


Procedure for the two first and last ordinates. At the beginning and 
at the end of our observations we lose out in neighbors, since only 
neighbors on one side are at our disposal. This destroys the symmetry 


§ 8 Differentiation of an Empirical Function 323 


of the process and reduces accuracy considerably. We are thus forced 
to lay a least-square parabola through the first few points and use its 
derivative at the points ¢ = 0 and t = 1. It seems advisable to 
choose only four instead of five points at the beginning of the 
curve, since here the physical conditions are not settled yet and 
we cannot count on the smoothness of the curve to the extent that 
we can later on. 

The solution of the normal equations for the present problem 
yields the following formulas for the first two missing velocities: 


vey — — 21/0) + 13f(A) + 17f (2h) — 9f (3h) 
f (0) Fa oo ne 
(5-8.6) 


ray — ZIO + 3/(h) + FCA + fGBA) 
aaa aaa 

20h 
In our numerical example the first two missing velocities thus 
become 


y’(0) = 33 = 1.35,  y'(1) = 232 = 11.85 


These formulas are applicable also at the end of a series of observa- 
tions, in which case the ordinates f (0), f (h), f(2h), f(3h) have to be 
changed to /f(zx,), f(x, —h), f(#, — 2h), f(x, — 3h), and the 
formulas (6) give us —f’(z,) and —f’(z, — h). 

It may happen that the sequence of our data is not rapid enough to 
justify the assumption that five consecutive data are practically on a 
parabola of second order. In such a situation we may feel safer if 
only four data are combined for one smoothing. We preserve 
symmetry if we agree that the velocities shall be obtained halfway 
between the data. The movable strip now contains four numbers 
only: —3, —1, 1, 3, and the zero line is halfway between --1. We 
now miss only one value at the beginning and at the end. That 
value is now obtained as the arithmetic mean of the two 
quantities of (6): 


F (=) = —8f(0) + 4f(h) + 6f(2h) — 2f (3h) (5-8.7) 


2 10 


324 Data Analysis Chap. V 


Applying this alternative method to the previous set of observations 
we now obtain the following set of velocity values: 


—3 0 
(6.6) 
—1 4 
17.10 
1 | 25 
21.52 
3 | 50 
31.71 
67.4 
42.35 (5-8.8) 
124.9 
44.91 
172.0 
51.90 
201.4 
53.46 
288.1 
59.03 
321.3 
387.1 


The first bracketed quantity was obtained on the basis of formula (7). 


9. Differentiation by integration. It is of interest to observe that 
the method of differentiation here described is more an integration 
than a differentiation process. Let us assume that the sequence of 
our observations is very dense, i.e., that the h in the formula (8.4) is 
very small. Then in the limit the formula (8.4) goes over into the 
following integration process. 


‘Ga im dt 5-9.1) 
=z], e+ (5-9. 


where ¢ is sufficiently small. We can see by expanding f(x + t) into a 
Taylor series that the operation on the right side of (1) actually gives 


§9 Differentiation by Integration 325 


f'(x) with arbitrary accuracy at all points where f(x) is analytic. We 
substitute on the right side 


fe+)=f{Ot/P'O+S/O+IO+ (69.2) 


and obtain as a result of the integration: 
/ lá 1 ii 
O=O rA" O+ (5-9.3) 


Hence the error is of second order in ¢. Moreover, formula (1) 
establishes a derivative even in points where a derivative in the 
ordinary sense does not exist. This is exactly what we want in the 
presence of “‘noise,”’ since noise is a typically nonanalytical pheno- 
menon which destroys the analytical nature of the true f(x). This 
comes into strong profile in the behavior of the difference table of an 
empirically given table. Formulas (4.3) and (4.4) indicate that the 
central differences of odd order should be used for evaluation of the 
derivative. But these differences get invalidated practically from the 
beginning by the presence of noise, and the more so the smaller h is, 
since the differences decrease with h, while the noise remains on the 
same level, thus causing an intolerably large relative error. On the 
other hand, a difference table can be extended also to the left, in 
form of the differences of negative order, which actually mean 
summation instead of differencing. The damaging influence of the 
noise in these sums is greatiy diminished, because of the randomness 
of the errors, which tend to balance them out in the process of 
summation. 

The following formula of integrating by parts shows directly that 
the process indicated by the formula (8.4) is available if we possess 
the first two left columns of the difference table. 


b 
Í xu’ dx = 
a 


If we put u’ = y and denote u by D-'y and the integral of u by 
D-*y, we can write 
b 
f xy dx = 
a 


b b 
xu -Í u dx (5-9.4) 


a 


a 


zxD71 


b b 
— |D 


a a 


(5-9.5) 


326 Data Analysis Chap. V 


We will thus show that the table (8.5) can be obtained in a numeri- 
cally different manner which is sometimes even quicker than the 
previous algorithm. We now construct two columns of the differences 
of negative order, i.e., the sums and then the sums of the sums of the 
data. For the sake of simpler arrangements we write the columns to 
the right from the data column, although from the standpoint of the 
difference table they belong to the left: 


« | y | Ly | 34 
0 0 0 0 
1 4 4 4 
2 25 29 32 
3 50 79 112 
4 67.4 146.4 258.4 (5-9.6) 
5 124.9 271.3 529.7 
6 172.0 443.3 973.0 
7 201.4 644.7 1617.7 
8 288.1 932.8 2550.5 
9 321.3 1254.1 3804.6 
10 387.1 1641.2 5445.8 


Let us consider the line x = 8, y = 288.1. Since two neighbors 
were used, we go to the line x = 10, while the lower line x = 6 is 
diminished by 1, thus giving us x = 5. In the column Èy we find 
the two numbers 1641.2 and 271.3. Applying the formula (5), the x 
in the first term of the right side has to be put equal to 2, and since 
at the lower limit x = —2, we form the sum 1641.2 + 271.3 and 
multiply by 2. 

(1641.2 + 271.3)2 = 3825.0 
Now we come to the second term, which can be found in the column 


Ley. We go to the lines 10 — 1 = 9 and 6 — 1 = 5 and find there 
3804.6 and 529.7, the difference of which is formed. 


3804.6 — 529.7 = 3274.9 
We now have two terms and obtain their difference. 


3825.0 — 3274.9 = 550.1 


§ 10 Second Derivative of an Empirical Function 327 


This agrees—except for the position of the decimal point—with the 
value of y’(x) found in table (8.5), opposite the entry y = 288.1. We 
have thus found an alternative method for construction of the 
velocity data. 

In many problems of engineering we are interested in the 
integral of a function and also in the moment of the integral. 
The method discussed in the present section shows how both 
quantities can be obtained by constructing the Zy and the X*y 
columns. 


10. The second derivative of an empirical function. The difficulty 
of coping with the noise in the problem of differentiation is even 
more strongly emphasized if the second derivative of an empirically 
observed function has to be found. Yet such problems are frequent 
in analysis of tracking data, since we want to draw conclusions 
concerning the action of forces in aerodynamical problems; thus we 
are forced to obtain the acceleration of the displacement measure- 
ments. The difficulty can be illustrated by the fact that even a 
sudden change in the force will cause but a slight disturbance in the 
displacement. Smoothing our data causes a small error if the 
displacement itself is considered, but the second derivative may be 
altered by that process very considerably. Hence we cannot expect 
any great accuracy in the numerical evaluation of the second 
derivative. 

Instead of trying to obtain the second derivative directly, it seems 
preferable to proceed in two steps, obtaining the first derivative by 
the process described before and then applying the same process once 
more to the first derivative. Since each single point of the derivative 
involves an interval of 5 observations, and again an interval of 5 
derivative points contributes to one point of the derivative of the 
derivative, we find that nine consecutive points of the original curve 
determine one point of the acceleration curve. This gives a very good 
chance for averaging out accidental errors but works against us if the 
acceleration curve is in reality unsmooth because of more or less 
sudden changes of the force. 

If we are not interested in the velocity curve but want to obtain the 
acceleration curve directly, we can combine the two operations into 
one movable strip operation, letting the movable strip —2, —1, 0, 1, 2 
operate on 0, 0, =+ , 1, 0, 0, = and then repeating the operation on the 


328 Data Analysis Chap. V 


result. We thus find that the code numbers of the second derivative 
process become 

4,4, 1, —4, —10, —4, 1, 4,4 5-101 
100}? oi 
This process is numerically very convenient since all the multiplica- 
tions by 4 can be combined into one operation. Once more we put 
the code numbers on a movable strip and let that strip glide down 
along the column of data. The result belongs to the time moment 
opposite to the code number — 10. 

As a numerical example, we carry through the process for the data 
of the table (8.5), obtaining now the second derivative y’(«) from 
second to second. The h of our problem is 1, and the adjustment due 
to the denominator of (1) is merely a shift of the decimal point by 2 
places to the left. 


y | y 
4 0 
4 4 
1 25 
= 50 
—10 67.4 | 7.97 (5-10.2) 
—4 | 1249 | 5.98 
i | 17% 4.64 
4 | 201.4 
4 | 288.1 
321.3 


Procedure for the first four and last four ordinates. Since the general 
process requires 4 neighbors on both sides of each observation, we 
lose the first four acceleration values at the head of our table, and 
the same will happen again at the end. We can restore these values, 
however, although with diminished reliability, by using the separate 
technique (8.6) for evaluation of the first (and last) two ordinates of 
the velocity curve. The final result can be expressed in terms of 
weight factors which have to be applied to the first sequence of 
observations. We obtain a separate set of factors for the first, 


§ 10 Second Derivative of an Empirical Function 329 


second, third, and fourth points of the acceleration curve. They are 
tabulated in the successive columns of the following table. 


Yo 115 85 53 26 

Yı —116 | —76 —33 —4 

Y | —124 | —84 —51 —18 

Ys 118 58 13 —14 (5-10.3) 
Ya 25 15 2 —8 

Ys —18 2 8 2 

Ye 8 8 

To be divided 
Yq 8 by 200A?. 


In our numerical example (2), application of these weights leads to the 
following results: 


y"(0) = 8.86 
y"(1) = 8.78 
y"(2) = 8.76 (5-10.4) 
y"(3) = 7.66 


We have pointed out already in the velocity procedure that the 
combination of 5 successive data may not be tolerable if the time 
marks of our observations do not follow each other closely enough. 
The table (8.8) was thus based on only four neighboring observations, 
and we have obtained the velocities halfway between the data points. 
Carrying through the same process twice, the original position of 
the points is once more restored. We now utilize only three instead 
of four neighbors on each side of every point. The code numbers for 
the acceleration curve now become 


9, 6, —5, —20, —5, 6, 9 
3 bd >] ? ? ? 5- 5 
a [a (5-10.5) 


and we lose only 3 instead of 4 ordinates in the beginning and at the 
end of the curve. 


330 Data Analysis Chap. V 


Since the first velocity we possess is f’(h/2), and the derivative of 
these data starts with f”(h), we are unable to give f”(0). Hence we 
have to leave a blank space opposite the first and the last observa- 
tion. The weights for the next two accelerations are given in the 
following table which takes the place of the more elaborate table (3). 


| h | fen 


Yo 52 27 
Yı — 54 —14 
(5-10.6) 
Ya —44 —29 
Ys 36 1 
Ya 16 6 
To be divided by 
Ys —6 9 10042. 


A comparison of the four-neighbor and three-neighbor technique 
of obtaining the acceleration of take-off measurements is given in the 
following table, which contains the first 20 acceleration data of a 
flight test, once evaluated by using four neighbors (y”) and once by 
using three neighbors (y”) on both sides of each observation. 


y | ww | mitt yy {| w& | 
0 8.86 387.1 8.22 9.90 
4 8.78 8.13 500.6 3.60 4.01 
25 8.76 7.97 574.8 3.39 —1.59 
50 7.86 8.59 650.1 3.81 3.84 
67.4 | 7.97 8.08 755.8 6.26 7.46 (5-10.7) 
124.9 | 8.98 6.31 815.8 | 10.17 11.70 
172 4.64 4.03 961.9 8.25 8.31 
201.4 | 6.12 4.39 1051.8 7.49 8.52 
288.1 | 6.58 6.09 1187.8 5.21 3.91 
321.2 | 8.76 | 11.33 1321.6 5.49 4.27 


The general pattern of the acceleration course comes out similarly 
in both calculations. However, the amplitudes of the fluctuations 
are much larger in the second case where 3 instead of 4 neighbors 
were used. From the discrepancy of the two sets of results, the 
conclusion can be drawn that the second-to-second observations of 


§ 11 Smoothing in the Large by Fourier Analysis 331 


this take-off were not sufficiently close together to make a satisfactory 
smoothing possible without disturbing at the same time the true 
course of the path. The first phase of an airplane flight is not smooth 
and far from the steady-state conditions which dominate the later 
development of the flight. The transient oscillations, caused by the 
coupling between horizontal and vertical motion, make themselves 
felt even during the time when the airplane is still on the runway. The 
condition that the acceleration does not change essentially during 
five or even four consecutive measurements was not fulfilled in the 
present problem. It would have been necessary to obtain four to 
five frames per second as the basis of our analysis if our aim is to get 
a reliable acceleration curve in which the random errors of the 
observations are eliminated and yet the true course of the acceleration 
is essentially preserved. 


11. Smoothing in the large by Fourier analysis. In our previous 
discussions every observation was combined with its immediate 
neighbors to the left and to the right. We made use of the analytical 
nature of f(x) in the neighborhood of a point and tried to eliminate 
the nonanalytical behavior of the noise by operations which did not 
leave the immediate neighborhood of a given point x. We can thus 
speak of “smoothing in the small,” or the “neighbor technique,” of 
eliminating noise. We will now consider an entirely different 
possibility. Instead of breaking our observations into a sequence of 
neighborhoods we may consider the entire set of data as one unified 
whole and try to find clues by which the true course of the function 
and the superimposed noise may be separated. This method of 
smoothing has the advantage that it is more independent of any 
special assumptions concerning the nature of the unknown f(x). In 
our previous considerations we have assumed, for example, that in a 
certain finite neighborhood of a point, the second derivative of f(x) is 
practically a constant. This means in physical terms that the force 
acting on a moving body changes but slowly within a specified time 
interval. Such assumptions may not always hold under actual 
aerodynamical conditions. A sudden gust, for example, represents 
a sudden and unforeseen change in the second derivative which does 
not satisfy our previous hypothesis of continuity. The neighbor 
technique of smoothing would tend to smooth out this discon- 
tinuity and thus change the true course of the second derivative by 


332 Data Analysis Chap. V 


presupposing a smoothness which does not agree with the actual 
physical picture. Hence we may welcome a method of smoothing 
for which such assumptions need not be made. 

Whenever approximation problems in the large are considered, 
the Fourier series appears automatically on the scene as one of the 
most powerful mathematical tools. We can thus attempt to analyze 
the problem of noise in terms of the Fourier series. We have observed 
already in our earlier discussions that the noise does not share with 
ordinary analytical functions the property of smoothness and differen- 
tiability. This property puts the noise into a special category with 
respect to the Fourier series. 

The Fourier series is strictly speaking an infinite series, but the 
convergence of the series makes it possible to truncate the series to 
the first n terms. How large this n has to be chosen will depend 
decisively on the analytical nature of the function to which the 
Fourier series is applied. If the function is everywhere continuous 
but the derivative is discontinuous at some point, the terms of the 
Fourier series decrease with the speed n-*. If the function itself 
becomes discontinuous at some point, the terms decrease with the 
speed of nm only. A pulse of short duration endangers the con- 
vergence of the series, and the formal expansion of an infinitely 
sharp pulse (“delta function”) becomes actually divergent. Now 
“noise” can be conceived as an irregular sequence of sharp pulses, 
and thus the harmonic analysis of noise will not have the tendency 
to converge. Here, then, is a chance to distinguish between the true 
course of a function and the noise superimposed on it. The harmonic 
analysis of the function will show much faster convergence than the 
harmonic analysis of the noise. 

In order to take advantage of this characteristic difference in the 
convergence behaviour of function and noise, it is necessary to make 
the dividing line as sharp as possible. Given is a large number of 
observations at the points 


x= 0,h, 2h, ,nh=1 (5-11.1) 


If we do not use proper precautions, the Fourier series set up for the 
approximation of the unknown f(x) will have very poor convergence 
even if f(x) is everywhere well behaved, because the boundary 
conditions are not fulfilled. Since neither f(x) nor f'(x) returns at 
x = l to the value it had at x = 0, the discontinuity at the boundary 


§ 11 Smoothing in the Large by Fourier Analysis 333 


will determine the rate of convergence. We can improve the con- 
vergence by reflecting f(x) as an even function for negative x, and 
thus expand f(x) into a pure cosine series. This dispenses with the 
discontinuity in the function itself, but the derivative is still dis- 
continuous at the boundary. However, we can go one step further. 
We subtract from f(x) a properly chosen « + fx, thus considering 


g(x) = fle) — (a + Ba) (5-11.2) 
We determine « and f by the boundary conditions 

g0)=0, g/)=0 (5-11.3) 
Then we reflect g(x) for negative x as an odd function. 


g(—*) = —g(2) (5-11.4) 


The result is that we have obtained a function which, if made periodic 
with the period 2/, has no discontinuity in either function or 
derivative. The first discontinuity appears in the second derivative. 
The asymptotic order of magnitude of the Fourier terms is now n’, 
and that is a practically satisfactory convergence. 

Hence we are going to develop the function g(x) into a pure sine 
series of the form 


2a 
g(x) = b; sin — 7% + ba sin — j aos (5-11.5) 


Since we have at our disposal the values 
Y =f (kh, (kK =90,1,2,°-,n) (5-11.6) 


we first modify f(x) to 


fD -J0 = 0 


g(x) = f(x) — f(0) — (5-11.7) 


and achieve the boundary conditions (3). Then we determine the 
coefficients b, of the expansion (5) by the condition that at the data 


334 Data Analysis Chap. V 


points x = kh the series shall give the modified basic data g(kh), that 
is, the original measurements corrected by « + fx. This yields 


4 n—1 l 2 
b =< Z g(ah) sin ka = (5-11.8) 


Now smoothing is always based on the fact that we have many 
more measurements at our disposal than are needed by the smooth- 
ness of the function. The harmonic analysis of f(x) does not contain 
overtones beyond a certain frequency, usually called the “cutoff 
frequency” vọ This means that beyond a certain predictable 
point the Fourier coefficients b, are practically zero. The highest 
order k =m which need be considered is determined by the 
condition 


= x = 2mvge (5-11.9) 
or 
m = 2yol = 2vọnh (5-11.10) 
This gives the condition 
m[n = 2yoh (5-11.11) 


The product 27h, i.e., the double cutoff frequency times the time 
interval between two consecutive measurements, is a pure number 
which is decisive for the effectiveness with which we can smooth our 
measurements. We will denote the reciprocal of this number by the 
symbol p and call it “smoothing parameter.” 


p = 1/(2yh) (5-11.12) 


If p is smaller than 1, this means that our measurements are so far 
apart that they are unable to determine the course of f (x) even in the 
absence of noise. If p = 1, we have just the minimum number of 
observations for the determination of f(x) and nothing is left over for 
smoothing. Hence p must be larger than 1 in order to smooth at all. 
Generally the significance of the smoothing parameter p is the ratio 
of the actual number of observations to the minimum number required 
in the absence of noise. Effective smoothing demands that p shall be 
at least 2, but we will hardly be satisfied if it is less than 4 to 5. 


§ 11 Smoothing in the Large by Fourier Analysis 335 


Let us first assume that for physical reasons we can establish the 
position of the cutoff frequency ») in advance. Then we can argue as 
follows. In the absence of noise the Fourier coefficients b4, ba, «-- , bm 
would have certain values but would be practically zero beyond b. 
Then the Fourier synthesis 


T 


ze (5-11.13) 


g(x) = b, sin aa + b,sin 2 ze +--+ bm sin m 


would properly interpolate our function, not only in the data points 
but at all points of the range. 

The presence of noise changes the picture. The coefficients 
bmi » On are no longer zero nor have they the tendency to 
diminish. They represent the nonconvergent part of the Fourier 
series, caused by the nonanalytical nature of the noise. An ideally 
“random” noise would have a Fourier spectrum which has no 
preference for any frequency and would thus have an average 
amplitude with random fluctuations for all frequencies. If our 
analysis included the sine and cosine functions, we could speak of a 
“random distribution of phase,” while the amplitudes would remain 
of the same constant order of magnitude. Since our analysis contains 
only sine functions, the randomness of the phase is replaced by a 
random sequence df plus and minus signs in the distribution of the 
b, amplitudes; (k > m). 

Now the amplitudes b, of the harmonic analysis are influenced by 
the noise in two ways. The amplitudes beyond 5,, are completely 
caused by noise. In this part of the spectrum we can eliminate the 
noise altogether by simply omitting in the Fourier synthesis every 
term beyond k = m. Originally we assumed the Fourier series in the 
form 


n—1 
g(x) = > b, sin k3 x (5-11.14) 
k=1 


and determined the coefficients b, by the condition that our data shall 
be exactly represented; now we truncate the series by forming the 
sum 


g(x) = >> sin k X (5-11.15) 
k=1 


336 Data Analysis Chap. V 


By this process we have eliminated all the high-frequency components 
of the noise. Now it is true that even the b, components for k < m 
are to some extent influenced by the noise. But in this part of the 
spectrum we are unable to separate noise and true component, 
since the randomness of the sign prevents us from subtracting the 
noise part of the component. This uncertainty is unavoidable. Yet 
we have succeeded in eliminating the major portion of the noise by 
disregarding that part of the Fourier series which is completely 
caused by noise. 

In actual practice we will seldom possess the frequency vọ in 
advance. Even if we know that for physical reasons no frequency 
beyond a certain 7) can appear in the measured g(x), there is no 
guarantee that all the frequencies up to v are genuine. The smooth- 
ness of g(x) may be such that the genuine spectrum ends much sooner 
than vo, and all amplitudes beyond that point are spurious and should 
be omitted. The smaller the number of components of the genuine 
function g(x) is, the more we will succeed in eliminating the noise of 
our measurements. 

Now the subtraction of a linear trend « + px from our observa- 
tions made it possible to reduce the unsmoothness at the boundary to 
a discontinuity in the second derivative. This is a mathematical 
condition which is frequently matched by the actual physical situa- 
tion. In tracking problems, a discontinuity in the second derivative 
means a sudden change in the force. Such a sudden change in the 
force, necessitated mathematically by our desire to make our 
function periodic, can occur also physically during our observations 
on account of sudden gusts, or running out of fuel, or separation 
from the booster, and similar effects. Hence the physical unsmooth- 
ness of the problem is of the same character as the unsmoothness at 
the boundary, and the n-* law of the coefficients could not be improved 
even if the function were genuinely periodic. 


12. Empirical determination of the cutoff frequency vọ Under 
the given conditions we do much better if we do not try to determine 
v in advance but let our data themselves make the decision. The 
decrease of the Fourier components according to the n° law is 
sufficiently steep for a rather effective separation of the analytical and 
the nonanalytical parts of the spectrum. We make a complete 
Fourier analysis of our data and plot the components b, obtained as 


§12 Empirical Determination of Cutoff Frequency vọ 337 


ordinates at the abscissa values k = 1, 2, 3, +. In the beginning, 
the b, are large, and no particular law can be detected in their distri- 
bution. But then they diminish rather steeply to relatively small 
values which do not decrease any further. From here on we observe 
that the amplitudes remain within a certain band of plus-minus 
values. They have no tendency to become either much larger or much 
smaller than a certain average size. This average size can be ascer- 
tained by starting from the end of the spectrum and evaluating the 
sum 


p= È (bta + oe +--+ Btw] (5-12.1) 


As N increases, p will approach a fairly constant value. We then 
draw the horizontal lines y = + and determine the point k = m 
where these lines intersect the low-frequency part of the spectrum. 
We keep all the b, up to k = m and omit all the b, with k >m. 
The mere truncation of the Fourier series to m terms performs the 
smoothing of our data. 

It is impossible to determine the exact point k = m where we 
should terminate the series. A certain “twilight zone” is inevitable 
in which the b, caused by the function and those caused by the noise 
are of the same order of magnitude. This uncertainty, however, is of 
no critical importance. We have the inevitable uncertainty caused by 
the low-frequency noise in the analytical components which could not 
be eliminated. In the face of this uncertainty it makes very little 
difference whether a few spurious Fourier coefficients have been 
added or not, or whether we have omitted a few Fourier coefficients 
which actually belong to the function but which have been 
interpreted as noise. It is important only that the twilight zone 
shall not be too extended. But this danger does not exist, since 
the n? law guarantees a sufficiently steep decrease of the genuine 
amplitudes to separate them from the nondecreasing noise com- 
ponents. 

The following numerical example gives an actual demonstration 
of the method. A sequence of 68 take-off data was subjected to a 
Fourier sine analysis, after subtraction of the linear trend a + fz. 
The Fourier components 5, were evaluated on the basis of formula 


338 Data Analysis Chap. V 


(11.8) (with n = 67), with the help of I.B.M. equipment. The 
analysis gave the following values for the 66 sine components of the 
given function g(x). 


k | -b |k | —b: |k | -b | k | -b 
1 | 2024.40 |17| 2.33 |33] 4.50 |49] 1.27 
2} 200.65 |18| —5.05 |34 | —2.47 | 50 | —3.54 
3 | 115.14 |19] 2.65135) 2.99} 51] 1.83 
4 | 20.35 |20| —1.80 | 36 | —2.85 | 52 | —0.23 
5] 1608 |21| 2.53 |37| —0.48 | 53| 2.83 
6 5.62 |22| —0.87 |38 | —0.07 | 54| 1n 
7 8.31 | 23 | —1.58 | 39| 0.02 | 55 | —3.59 
8 6.89 |24| 0.60 |40| 2.29 |56] 2.50  (5-12.2 
9 0.95 |25 | —4.33 | 41 | —1.14 | 57 | —4.35 
10 7.62 |26| 4.26 |42| 049158] 2.93 
11 | —2.30 |27| —3.82 | 43 | —2.42 | 59 | 0.34 
12 3.24 |28| 4.70 |44| —0.46 |60| —0.12 
13 1.97 |29| —4.45 |45 | —0.04 | 61 | 0.62 
14 1.31 |30| 0.55 |46| 0.58 | 62 | —2.57 
15 3.89 | 31 | —1.13 |47| 2.18163] 0.12 
16| —2.86 | 32 | —2.14 | 48 | —1.91 | 64 | —1.19 
65 | 2.25 
66 | 0.67 


Examination of this table shows that the “‘noise’’ in the present 
problem is not of a completely random character. The sequence of 
the + signs shows a definite regularity, with many systematic +, — 
alternations. Such regularities can occur if we have isolated “‘glaring 
errors”? among our observations which fall out of the average order 
of magnitude of errors. We could have eliminated these “bad spots” 
right at the beginning by the device of differencing; this would 
have made the noise more incoherent. However, the method 
of the Fourier series has the advantage that these bad spots do 
no essential damage to the smooth part of the function, but 
merely increase the noise to some extent. Hence we get valid 
results even without removing the obviously bad observations from 
our data. 

The ‘twilight zone” is not extensive in our problem. The ordinate 
bio = 7.62 is definitely outside the noise band and has to be included 
in the analytical part of the function. We may terminate the series 


§12 Empirical Determination of Cutoff Frequency vọ 339 


here, but we may also include b, and b} in our expansion and 
terminate our series at k = 12. The uncertainty is thus of no 
vital consequence. We shall decide on k = 12 and omit all overtones 
beyond the 12th. The number of observations is thus about 5.5 
times as large as the needed number of components. Hence we can 
assume that the truncated series, apart from being smooth, has 
eliminated about 80% of the total noise. The remaining 20% is 
present in the form of a contamination of the retained first twelve 
5, components. 

The question of differentiating this series still remains. The fact 
that all overtones beyond the 12th are negligible for the displacement 
does not mean that the same will be true for the velocity and even less 
for the acceleration. The fact that we have a good approximation for 
f(x) does not imply that we have a good approximation for f’(x) 
or f(x) as well. The derivative of a good approximation is not 
necessarily a good approximation of the derivative of the function. 
The function f"(x) is much less smooth than f(x) itself and may 
require a considerably larger number of Fourier terms for its 
representation than f(x). But these higher order terms are not 
available by differentiating the Fourier series of f(x) because 
the b, associated with these k are almost completely caused by 
noise and are not indicative of the behavior of f(x) in the absence 
of noise. 

Under these circumstances we have to return to the local opera- 
tions by which a derivative can be defined. A derivative is by its very 
nature determined by the values of f(x) in the neighborhood of a 
certain point. The law “‘mass times acceleration equals moving 
force” shows that in a sufficiently small time interval the acceleration 
cannot change too rapidly. We do not go wrong if we assume that 
five neighboring observations lie on a parabola of third order. Then 
the second derivative has still enough freedom to change linearly in 
this time interval. If this condition is not fulfilled, our observations 
are altogether too far apart from each other to allow any effective 
smoothing. A cubic parabola of the form y = a + fa + ya? + 623 
has four degrees of freedom, and thus we determine four constants 
on the basis of five observations. The amount of overdetermination 
is thus slight and we did not violate the fidelity requirements of our 
problem. If we take the second derivative of this local least-square 
parabola at the point of symmetry, we obtain the following symmetric 


340 Data Analysis Chap. V 


weighting of five successive ordinates, which take the place of the 
previous weight factors (10.1): 


2, —1, —2, —l, 2 


a (5-12.3) 


In the previous scheme every observation was combined with four 
of its neighbors on both sides. Hence nine consecutive data deter- 
mined one point of the acceleration curve. This is justified if the 
observations follow each other in sufficiently close intervals. But 
if this condition is not fulfilled, the previous method (10.1) will lead 
to oversmoothing. We will blot out certain details of the accelera- 
tion curve which are real and not caused by noise. It is necessary to 
keep in mind that the problem of smoothing has two aspects. One is 
that we should eliminate noise as much as possible; the other is that 
we should not eliminate details which actually belong to the function 
to be observed. We have no a priori reasons to assume that the 
acceleration curve will be particularly smooth if the flight of an 
airplane is in question. Sudden gusts may interfere with the action 
of the regular aerodynamical forces, but even the ordinary aero- 
dynamical forces alone can cause complicated deviations from 
steady-state flight. If we want to study the acceleration curve 
realistically, we will try to avoid oversmoothing the curve. The 
weighting (3) has better fidelity chances than the previous weighting 
(10.1), since it involves only five instead of nine consecutive data. On 
the other hand, the danger is now that we have not succeeded 
sufficiently with elimination of the noise. The acceleration curve 
thus obtained may be too unsmooth because of observational 
errors. 

At this point the Fourier analysis technique may be invoked for 
additional smoothing. If is the total number of observations and 
we evaluate n — 1 sine coefficients for the Fourier analysis of the 
acceleration data obtained by local least square parabolas, we can 
assume that not all these coefficients will actually be needed for 
representation of the true acceleration. The highest coefficients 
correspond to high-frequency oscillations which for aerodynamical 
reasons have to be considered highly implausible. An earlier 
termination of the series will thus properly smooth the acceleration 
curve, without violating essentially the fidelity requirements. A 
combination of slight local smoothing with additional smoothing by 


§12 Empirical Determination of Cutoff Frequency vo 341 


Fourier truncation can thus be considered the most plausible solu- 
tion of the problem of smoothing, if great caution is demanded by 
the fact that the data do not follow each other closely enough to 
allow effective smoothing by local parabolas alone. 

The numerical example on page 342 shows the effect of smoothing 
by two different techniques. A sequence of 68 take-off observations 
was analyzed to obtain the acceleration at every point of the 
curve. The column s contains the actual displacement data, in feet, 
taken at intervals of h = 0.96 second. The column y contains 
the acceleration, evaluated by the method of §10. The first four 
and the last four y values were obtained with the help of the table 
(10.3). 

The column a has a different origin. The weighting was now done 
according to the scheme (12.3), omitting first the denominator 7h?. 
The first two and the last two acceleration data were obtained on the 
assumption that the local parabola constructed at the point k = 2 
(and likewise k = n — 2) can be used for obtaining the acceleration 
at the missing points. This gives the following table, in corre- 
spondence to the previous table (10.3). 


| L'O | ft’ 
Yo 9 5.5 
Yi —15 —8 
Yo —2 2 To be divided by 74? (5-12.4) 
Yz 13 6 
Y4 —5 —1.5 


The values thus obtained (without the common denominator 7h?) 
are contained in column a. These values show considerable scatter 
and require additional smoothing. For this purpose the method of the 
Fourier analysis was employed. A Fourier sine series was applied 
in accordance with the principles of $11. In our example the 
subtraction of a linear trend « + px could be omitted because the 
acceleration could easily be extrapolated to zero at both ends of the 
series. The data were thus directly suited to a sine analysis. The 
series was then truncated to 30 terms. The synthesis of these 30 
terms, evaluated for the points of observation, gave the column å. 
Finally, dividing by 7h?, we obtained the last column y. 


342 Data Analysis Chap. V 


ki s | vw lafl at Fils | y | a@ |] z| 7 

0 117 |—0.74 14 | 12.1 1.86 || 34 | 4042 2.85} 19 16.3 2.51 
1 144 | 2.38 22 | 17.7 | 2.72 135 | 4230 2.841) 17 21,5 3.31 
2 163 5.69 30 | 38.8 | 5.96 |] 36] 4425 2.57} 23 21.9 3.37 
3 192 | 8.34 74 | 67.1 110.31 || 37] 4619 2.29} 12 13.9 2.14 
4 229 | 9.48 75 | 78.5 112.07 || 38] 4819 1.82} 12 6.0 0.92 
5 281 8.70 67 | 62.9 | 9.67 || 39} 5017 2.03 1 8.4 1.29 
6 340 | 6.98 34 1 38.0 | 5.84 || 40! 5218 2:521 Fi 18.5 2.84 
7 407 | 5.55 26 | 26.8 | 4.12 || 41] 5420 3.18] 31 25.3 3.89 
8 472 | 5.05 34 | 31.1 | 4.78 || 42| 5623 3.48| 25 23.7 3.64 
9 545 5.11 39 | 35.8 | 5.50 || 43| 5839 3.27| 24 19.7 3.03 


10 | 625 | 5.40 | 27 | 34.0 | 5.23 ||44| 6047 | 3.53 S| 20.5] 3.15 
11 | 706 | 6.02 | 35 | 34.8 | 5.35 ||45Į 6266 | 3.39) 36] 24.4) 3.75 
12 | 792 | 6.69 | 51 | 44.1 | 6.78 || 46] 6479 | 3.51] 28] 25.5] 3.92 
13 | 887 | 7.01 | 49 | 52.7 | 8.10 || 47] 6708 | 2.87] 18 | 21.3} 3.27 
14 | 989 | 6.82 | 47 | 49.2 | 7.56 || 48] 6933 | 1.95 Sr Ia) 24] 
15 | 1096 | 6.37 | 39 | 38.0 | 5.84 || 49] 7157 | 1.37] 16 5.5} 0.84 
16 | 1212 | 5.91 | 35 | 33.0 | 5.07 |} 50} 7389 | 0.90 3 3.6) 0.55 
17 | 1329 | 5.79 | 37 | 37.5 | 5.76 || 51] 7618 | 1.68] —6 4.1| 0.64 
18 | 1453 | 5.66 | 41 | 41.3 | 6.35 || 52] 7845 | 2.79} 19 | 18.1] 2.77 
19 | 1584 | 5.33 | 35 | 36.9 | 5.67 || 53] 8075 | 3.97] 43 | 32.9] 5.05 
20 | 1718 | 5.24 | 32 | 30.5 | 4.69 || 54) 8312 | 4.72] 30| 36.3] 5.58 
21 | 1858 | 5.10 | 32 | 30.8 | 4.73 |55) 8557 |} 4.67) 24) 29.5) 4.53 
22 | 2002 | 5.04 | 36 | 35.5 | 5.46 || 56] 8798 | 4.57} 32 | 25.8] 3.97 
23 | 2150 | 4.91 | 32 | 35.5 | 5.46 || 57| 9049 | 5.00] 37 | 30.7| 4.72 
24 | 2306 | 4.51 | 30 | 29.1 | 4.47 || 58] 9305 | 3.63} 17 | 30.0] 4.61 
25 | 2462 | 4.17 | 25 | 23.6 | 3.63 || 59} 9562 |—0.07| 12 8.5} 1.31 
26 | 2625 | 3.82 | 28 | 23.9 | 3.67 || 60| 9821 )—3.30|—16 )—22.8}| —3.51 
27 | 2790 | 3.69 | 18 | 26.0 | 4.00 || 61 |10082 |—4.55|—35 |—33.2| —5.10 
28 | 2959 | 3.71 | 27 | 25.4 | 3.90 |} 62/10330 |—0.32|—18 | —9.4 | — 1.44 
29 | 3129 | 3.76 | 28 | 23.7 | 3.64 || 6310578 | 2.04) 32| 25.3] 3.89 
30 | 3307 | 3.46 | 25 | 24.0 | 3.69 || 64110830 | 2.00} 42| 38.4] 5.90 
31 | 3486 | 3.17 | 17 | 23.8 | 3.66 |} 65}11092 |} 0.99) 18} 25.2} 3.87 
32 | 3668 | 3.10 | 23 | 19.9 | 3.06 || 6611356 | 0.64; 11 7.7) 1.18 
33 | 3853 | 3.12 | 16 | 15.2 | 2.34 || 67|11616 | 0.30 4 0 0 


For the sake of comparison, the two curves y and ¥ are plotted in 
conjunction (cf. the accompanying figure: y = dotted line, y= solid 
line). We notice that the nine-point smoothing oversmooths the 
curve by blotting out certain details of the acceleration curve which 
belong to the data and are not the result of the smoothing procedure. 
The origin of these oscillations cannot be decided on the basis of the 
data alone. A careful examination of the physical situation would 
be needed to obtain more information about these peculiar details of 


§12 Empirical Determination of Cutoff Frequency », 343 


the y curve. The Fourier analysis reveals unmistakably the presence 
of a certain high-frequency component. But on the basis of the data 
alone we cannot decide whether these oscillations belong to the 
aerodynamical situation or to the measuring instruments. 


The present example illustrates the difficulties we may encounter in 
evaluation of data which are not obtained under optimum conditions. 
If the aerodynamical analysis is not interested in all the accidental 
details of the take-off phenomenon, then the stronger smoothing of 
the y curve will be quite sufficient to give the essential features of the 


344 Data Analysis Chap. V 


event. But the existence of additional oscillations of small amplitudes 
can be, if properly understood, of considerable interest. An adequate 
study of these finer features demands recording instruments of 
superior accuracy and a sufficiently close sequence of observations in 
order to eliminate the instrumental part of the noise. This can be 
done with the help of local least-square parabolas. But then, after 
obtaining a sequence of acceleration data which are considerably 
free of instrumental noise, we can subject these data to an additional 
Fourier analysis. The components which stand out above the noise 
level will reveal the existence of hidden periodicities, the study of 
which may contribute to a deeper understanding of aerodynamic 
phenomena. 


13. Least-square polynomials. In the problem of smoothing we 
have encountered the process of laying a least-square parabola 
through four or five data points. Similar situations occur in many 
other problems of physics and engineering, and thus a general 
treatment of the subject is justified. Gauss, who first treated this 
problem, introduced an elegant notation which brings the resultant 
equations in particularly lucid form. He uses the bracket expression 
[u] in the following sense. The quantity u shall be taken at all the 
data points and the sum of all these values shall be formed. Hence, if 
the data points are distinguished by the subscript i = 1, 2, + , n, the 
notation [u] shall mean 


[u] = u, + ug +*+ un (5.13-1) 
consequently [x*] shall mean 


fat] = of + ok + + of 


and similarly 


[yx*] = yt + york +--+ y, ch 


It is not assumed that the observations are made at necessarily 
equidistant values of the independent variable zx. 

The general problem of a least-square polynomial can be stated as 
follows: We have theoretical reasons to assume that a set of 
observations 


Yis Ya, °° >Yn (S-13.2) 


§ 13 Least-square Polynomials 345 


which belong to an unknown function y = f(x) at some prescribed 
points of the independent variable 


ai Oo (5-13.3) 
can be fitted by a polynomial of the order m. 
Y = ay + qx + | + apa” (5-13.4) 


We know the order m of the polynomial, while the coefficients 
o» Ay, *** , Am are at our disposal and to be determined by the measure- 
ments. We assume that the number of observations surpasses m, 
otherwise the problem has no unique solution. 

The unknown coefficients a; of the polynomial (13.4) are now 
determined by the following principle. At each point of observation 
we form the “‘residual’’ 


ao + ax, + + amt? — Yi (5-13.5) 


and take the sum of the squares of all these residuals. 
Q = Dita t+ age%—y)? (5-13.06) 
i=l 


The quantity Q is by nature positive or in the limit zero. The zero 
value is only possible if each one of the residuals vanishes, that is, 
if our measurements are all consistent and fit an mth order poly- 
nomial exactly. This cannot be expected. We can, however, find 
a; values for which the sum Q becomes a minimum. We consider the 
polynomial associated with these a; as the “best fit” of our 
measurements; (cf. also § 16). 

This minimum principle has a unique solution in the form of 
a linear set of equations with nonvanishing determinant. According 
to the principles of calculus the condition of minimum requires that 
the partial derivative of Q with respect to any a; shall vanish. This 
leads to the equations 


alz] + ale] + °° + 4,,[2"] = ly] 
glx] + a,[27] + + + a,,[2"**] = [zy] (5-13.7) 


ag(z™] + alat] + + + a, [2°] = [ay] 


346 Data Analysis Chap. V 


These remarkable equations, called the “normal equations” of the 
least square problem, belong to a group of linear equations which 
have not only a symmetric but even a “recurrent”? matrix, 
characterized by the property 


Air = Ai-i» k+1 (5-13.8) 


The matrix of this system depends on only 2m + 1 different quantities, 
instead of the usual full number (m + 1): (m + 2)/2. We have 
encountered this type of equations before (cf. 3-3.5) and seen that 
there exists a successive algorithm for their solution. 

In the most important case of equidistant observations, the scale 
factor of x can be normalized in such a way that the constant interval 
between two observations shall become 1. Moreover, the origin 
x = 0 of the variable x can be put symmetrically into the mid-point 
of the total range. Then for every given m and every given n a 
numerical solution of the system (7) can be found, and the results can 
be tabulated in such a form that for every coefficient a, the given 
data y, are to be multiplied by a pretabulated set of numbers. This 
obviates the necessity of solving the system (7) separately in every 
case. Tables of this kind, for a reasonably small range of m and a 
reasonably large range of n, have actually been published.! 


14. Polynomial interpolations in the large. If a function f(x) is 
observed in equidistant intervals, the question arises what the values 
of f(x) are between the data points. In § 2 was developed the Gregory- 
Newton formula, which interpolates the function between the data 
points by a polynomial in x. In § 3 were developed the Stirling and 
Bessel kind of interpolation formulas which operate with central 
differences and give much better convergence. In both kinds of 
interpolation we take it for granted that the function can be inter- 
polated with the help of a power series. Although this assumption 
seems very reasonable, its validity is actually by no means guaranteed. 
Let us assume that a function f(x) exists in an infinite range, from 0 
to infinity, and is even analytical throughout this range. Let us give 
this function in the infinity of points x = 0, 1, 2,-. Can we now 
interpolate this function between the data points with the help of a 


1 Cf. H. T. Davis, Tables of the Higher Mathematical Functions, Vol. I (The 
Principia Press, Bloomington, 1933). 


§ 14 Polynomial Interpolations in the Large 347 


Gregory-Newton interpolation? Will this interpolation formula 
converge and will it give the right answer even if it converges? The 
closer investigation of this problem shows that only a very definite 
class of functions, defined by a certain integral transform, allows the 
Gregory-Newton type of polynomial interpolation. Functions which 
do not belong to this class yield a divergent expansion, which means 
that the interpolation formula loses its significance. 

This phenomenon shows that great caution is demanded when 
interpolating by powers. The difficulty is usually hidden by the fact 
that in tabulated functions the higher differences decrease so fast that 
in a few steps they are “off the board.” This does not mean, however, 
that the interpolation formula, if pursued to arbitrarily high terms, 
would reduce the error to arbitrarily small amounts. In fact, in many 
cases something quite unexpected would happen. The error would 
get smaller and smaller by the correcting influence of the higher 
differences, but eventually a minimum would be reached, and beyond 
that the error would increase again and become arbitrarily large. 
(We have in mind mathematical functions which in principle could 
be evaluated to any degree of accuracy. In observed functions the 
“noise” rules out use of the differences of high order.) 

How can we explain this puzzling phenomenon? Let us observe 
that the central differences found along a certain line of a difference 
table are determined purely by the immediate neighborhood of the 
data point at the head of the line. As we go to higher and higher 
differences, this “neighborhood” spreads out more and more. The 
use of a few terms or of many terms of an interpolation formula can 
thus be juxtaposed as follows. In one case we assume the validity 
of a polynomial approximation in the small, in the other case in the 
large. Now it so happens that a polynomial approximation in a 
sufficiently small neighborhood of a point is always safe and justified. 
But a polynomial approximation in the large is not always safe, and 
demands the proper safeguards. 

Weierstrass proved in 1885 the fundamental theorem that any 
continuous function of a finite range can always be approximated to 
any degree of accuracy by powers. This theorem establishes thus the 
justification for a polynomial expansion, and it seems that we cannot 
go wrong if we interpolate our data by powers. Actually, however, 
the theorem of Weierstrass, while establishing the validity of a 
polynomial approximation, does not imply that the approximating 


348 Data Analysis Chap. V 


polynomial is obtainable by fitting equidistant data. It was E. Borel 
in 1903 and O. Runge in 1901 who discovered the startling fact that 
we can take very simple analytical functions, such as for example, 


1 


IT 4 2528 


(5-14.1) 


in the range [—1,+1] and obtain quite wrong results by equi- 
distant interpolation. As we put our data points closer and closer 
together, the interpolating polynomial which fits all our points 
actually converges to the given f(x) unlimitedly in a large portion of 
the given range. But outside of a certain precalculable point—in 
the example (1) the point x = +0.726 ---—up to the end of the 
range the interpolating polynomial does not converge to any limit, 
and in fact goes beyond all bounds at every point of the range. 
Since Runge investigated this phenomenon in great detail, it seems 
justifiable to call it the “Runge phenomenon.” 

Hence the strange fact holds that a polynomial which does not 
fit the data points may be in a much better position relative to over- 
all accuracy than a polynomial which fits the data points. The 
following fact for example is of interest. Let f(x) satisfy the boundary 
condition f/(--1) = 0. Let us operate with a polynomial of the order 
4n. But instead of fitting 4n + 1 equidistant points exactly, we fit 
only the 2n + 1 key values f(k/m), (k = 0, +1, +n). The data half- 
way between are calculated from these key values by the following 
interpolation formula. 


Hence we have replaced the correct midpoint values by incorrect 
values. Nevertheless, the polynomial thus interpolated gives small 
errors all over the range, while the errors of the correctly interpolated 
polynomial go out of bound. 

The difficulties examined by Runge are caused only by the equi- 
distant character of the data. If the data are not equidistantly 
distributed but are placed into the zeros of the (2n + 1)st Chebyshev 
polynomial T,,,,,(x), the difficulties disappear. The errors of the 
interpolation now oscillate with the same order of magnitude all over 


§ 14 Polynomial Interpolations in the Large 349 


the range and converge at every point of the range to zero as the 
number of data points increases to infinity. 

Since equidistant data are much more convenient from both the 
numerical and the observational standpoint, we may ask how we can 
obtain effective polynomial approximations in spite of the equi- 
distant character of the data points. It so happens that in many 
problems of physics and engineering it is particularly desirable to 
replace a certain analytical function by a power series. The powers 
of x have great operational advantages, and we may want to use 
them even if the original (tabulated or observed) f(x) is not a power 
series. We know from the theorem of Weierstrass that such a 
replacement is always possible. But we also know from Runge’s 
investigation that we cannot obtain this polynomial by simple 
interpolation. The problem is not one in least squares, since we do 
not know in advance of what order the approximating polynomial 
will be, nor is it desirable to minimize the residuals, since small 
residuals in the data points can-cause large errors between. 

We return once more to the Fourier expansion of § 11 by which 
we tried to reduce the noise of our data. We subtracted a proper 
linear quantity « + px from the given data and then employed a pure 
sine series. We then separated the analytical part of the function 
from the noise part by examining the trend of the Fourier com- 
ponents and truncating the series at the proper point. We thus 
obtained an analytical expression which not only smoothed our 
data points but interpolated the values of f(x) between the data 
points. This method of trigonometric interpolation is free of the 
Objections of equidistant polynomial interpolation. The trigono- 
metric functions have orthogonality with respect to equidistant data 
and are thus in the same preferential position relative to such data as 
the powers of x are relative to data which are distributed according to 
the zeros of the Chebyshev polynomials. The trigonometric sine 
interpolation will automatically converge to f(x) at every point of the 
interval if the data points get denser and denser. 

Our aim is, however, to obtain a polynomial approximation of f (x). 
For this reason we will now convert our sine expansion into a 
polynomial expansion. To use the Taylor expansion for each one of 
the sine functions and then collect terms would not serve our purpose, 
since the resulting series would have slow convergence and thus 
require a large number of terms. We know in advance that we fare 


350 Data Analysis Chap. V 


best if we use the Chebyshev polynomials for expansion purposes, 
since this series will give fastest convergence (cf. VII, 6). Hence we 
will have to investigate the problem of converting a sine function 
into Chebyshev polynomials. 

The sine functions of the expansion (11.13), for data normalized 
to the range [0,2], are the functions 


p(x) = sin k t (5-14.3) 


It will be more convenient, however, to put the origin of our reference 
system in the mid-point of the range, thus separating in advance the 
even and the odd parts of our function. This means that z is to 
be replaced by x = x, — 1. The functions p,(x) now become 


Pox) = (—1)* sin krz 
Pary (£) = (—1)**1 cos (k + 4)7x 


The first group of functions gives the odd, the second group gives the 
even part of the function g(x); the addition of the correction « + Bz 
finally restores the original f(x). 

Now the expansion of the trigonometric functions (4) into 
Chebyshev polynomials is available in terms of the Bessel functions, 
J{z). We obtain the following results. 


(5-14.4) 


sin kere = 2 X (—1) Jon s(k 7) To 41(@) 
a=1 


3 (5-14.5) 
cos (k + prs =2 È (Tak + HT ale) 
a=0 


The X’ in the second formula refers to the fact that the first term of 
the sum must be halved. 

The method of interpolating a set of equidistant data by powers can 
thus be described as follows. After applying a linear correction which 
makes the two extreme data zero, we expand the data into a Fourier 
sine series, according to the method discussed in IV, 12. If the data 
are free of noise (mathematical data), we leave this expansion as it is. 
If the data have noise superimposed on them, we truncate the series at 
a properly chosen point. We now have a Fourier sine series of m 


§ 14 Polynomial Interpolations in the Large 351 


terms, with the coefficients b,, ba, ° , bm. We convert this series into 
an infinite expansion of Chebyshev polynomials. 


(e e] 


g(e) = > cT) (5-14.6) 


k=0 


the coefficients of which are evaluated as follows. 


Cor = 2(—1)* Er (7) b, + Jok (3 z) b, — Jal5 z) bs + -| 
(5-14.7) 
C2k+1 = 2(—1)* [—J. ox+1(7)ba + Jor+(27)ba — Jonss(37)bg + °°] 


The values of the Bessel functions of even order at the odd multiples 
of 7/2, and the values of the Bessel functions of odd order at the 
multiples of 7 can be pretabulated (cf. Table XI). Evaluation of the 
expansion coefficients c, is then reduced to multiplication of the b, 
coefficients by a numerical matrix. 

In the absence of noise we can go one step further. Evaluation of 
the Fourier coefficients b, and subsequent evaluation of the c, can be 
combined into one single step. We can take our equidistant data, 
after separating the even and the odd parts of the function g(x), and 
directly multiply them by a preassigned numerical matrix, thus 
obtaining in one step the even and the odd c,, without any preliminary 
Fourier analysis (cf. Table XII). 

Theoretically the expansion (6) into the Chebyshev polynomials 
T,(x) is an infinite expansion. However, the good convergence of the 
series makes an early termination possible. We evaluate the finite 
sum of v + 1 terms, 


Ze) = > Taa) (5-14.8) 
k=0 

at the data points and see how much error is caused by the truncation 
of the series. We can stop at an order k = v at which the residuals 
become sufficiently small. In the presence of noise, a natural termina- 
tion is effected by the criterion that the accuracy of our interpolation 
need not go much beyond the accuracy of our data. Hence we will 
truncate the series (8) at an order at which the maximum residual in 
any of the data points remains well within the average error of the 
data. 


352 Data Analysis Chap. V 


This method of obtaining a well-convergent polynomial expansion 
for a set of equidistant data has the advantage that we need not know 
in advance what order polynomial will be the most suitable for our 
purposes. The data themselves decide the most appropriate poly- 
nomial for a given accuracy. The difficulties of the Runge pheno- 
menon are avoided and we obtain a close approximation compar- 
able with that obtainable by the theoretically more desirable but 
practically much less accessible unequal distribution of data points 
corresponding to the zeros of the first neglected Chebyshev poly- 
nomial T,,,,(z). In our case the data points are equidistant and we 
still have the benefit of an expansion into Chebyshev polynomials, 
which will give a practically uniform approximation throughout the 
range. 


15. The convergence of equidistant polynomial interpolation. The 
previous section gave an answer to the problem of polynomial 
interpolation in equidistant points which solved the two basic 
difficulties: the problem of the noise and the problem of the non- 
uniformity of the error which can cause oscillations of harmful 
amplitudes around the two ends of the interval. Both phases of the 
problem were beneficially influenced by the temporary interjection 
of the Fourier sine functions, which make an effective separation of 
function and noise possible and which distribute the errors with a 
uniform order of magnitude throughout the given interval. The 
resultant truncated sine series was finally converted into an infinite 
Chebyshev expansion, which again could be terminated at a properly 
designated point. By this procedure we know in advance that our 
interpolation must converge to the given f(x) for any finite, single- 
valued, sectionally continuous function which does not oscillate 
infinitely many times in the given interval. 

However, in spite of these results, the investigation of Runge 
concerning the convergent or divergent behavior of the simple 
equidistant interpolation remains of fundamental interest. Since 
certain functions give convergent, certain others divergent expan- 
sions, the question arises whether we can decide in advance what type 


1 This last section is a brief summary of an elaborate investigation of equi- 
distant polynomial interpolation, with many numerical examples, of which a 
multilithed laboratory report came out under the title: “Analytical and Practical 
Curve Fitting of Equidistant Data,” Nat. Bur. Standards, Report 1591, 1952. 


§ 15 Convergence of Equidistant Polynomial Interpolation 353 


of functions will lead to one and what type to the other kind of 
expansion. This decision can actually be made if we know the 
analytical nature of the function f(x). We will show that if we can 
consider u = f(z) a function of the complex variable z = x + iy, 
the behavior of f(z) in a certain region around the z-axis uniquely 
determines the convergence behavior of this function with respect to 
equidistant interpolation, without any detailed investigation of the 
remainder. What we have to know is merely whether or not a 
certain explicitly given oval-shaped region around the z axis is free of 
analytical singularities. If the function f(z) behaves throughout this 
region including the boundary analytically, we know in advance that 
the equidistant polynomial interpolation of this function will con- 
verge uniformly in the interval [—1,+1]. If at any point a singularity 
occurs, the convergence will hold only within a certain subinterval of 
the total range [—1,+1], while outside of this range the interpolation 
diverges as n grows to infinity. 

This convergence behavior is very similar to that of the Taylor 
series, which is likewise determined by the analytical nature of f(z) 
outside of the x axis, even if our interest is completely restricted 
to the real range. If we draw a circle from the center of ex- 
pansion which reaches up to the nearest point of singularity, 
the Taylor series will definitely converge inside of this circle and 
definitely diverge outside of it, while on the circle itself the behavior 
is dubious. 

The difference in our case is only that the critical region is much 
more complicated than a circle, although mathematically available. 
We start with the following fundamental function. 


F (2) a FA“) 
Z — x 


(5-15.1) 


where F„(£) is the fundamental polynomial, i.e., the polynomial 
composed of the root factors 


Fn) = (€ — 2E — XQ) (@ — Em) (5-15.2) 
where 21, £2, *** , &m are the points of interpolation. 


Now the numerator of (1) is divisible by the denominator, and 
thus the function (1), considered as a function of x, must be a 


354 Data Analysis Chap. V 


polynomial of the order m — 1. The same remains true even if we 
divide by F,,(z) and consider the function 


Consequently the function $(z,z), considered as a function of zx, 
allows an exact polynomial interpolation, no matter how the x, are 
located. The resulting expansion terminates after mterms. Moreover 
the second term on the right side of (3) has no influence on the 
interpolation, since it vanishes at all points x,. This term can be 
considered the remainder of the interpolation. 

We thus see that the special function 


v(x) = 


(5-15.4) 
zZz — £ 
has the property that, if interpolated by powers in arbitrarily chosen 
points, the remainder of the interpolation is explicitly available. 
We restrict ourselves to the case of equidistant interpolation and 
put 
k 
ES (k = 0, me) e +n) 
(k?x? — 1)(k?x? — 4) --- (kx? — k?) 
(2k + 1)! 


Then m = 2n + 1 and F,,(x) becomes proportional to the (2n + 1)st 
Stirling function: 


Famila) = 2 (2 — 4) (#-4)~(#- 5 


2n + 1)! | 
las a Penale) (5-15.6) 


Poxsi(%) = kx (5-15.5) 


The resulting expansion becomes 


E npe) i... n Pan(%) 
aT AO pO T 1 Pa 
2n 
Bi 2 m + Neny(%,2)  (5-15.7) 


E a (k + 1) Pp41(2) 


§15 Convergence of Equidistant Polynomial Interpolation 355 


where 
Ponsih® 1 


Nonsi(%,%) = 
ae Ponyı(2) 2 — £ 


We will now make use of Cauchy’s fundamental integral theorem. 


fæ = Je) dz (5-15.9) 


z z — g£ 


where the integration extends over a closed loop which encloses the 
point z = x and likewise the data points z = x,, but is free of any 
analytical singularities of the function f(z). If in this theorem the 
function (4) is replaced by its expansion (7), we obtain on the right 
side the Stirling expansion of f(x), with a remainder. This remainder 
appears in the following form. 


_ Pans) f(z) dz 
Nensi(%) = Er TT (5-15.10) 


Now the function ,,,,;(z) remains bounded by +1 in the entire 
range [—1,+-1]. Hence we will focus our attention on the function 


ie (nz + n)(nz + n — 1)* (nz — n) 
Pansil2) = ~an 


= (nz + n)! 
= Ga Aa (5-15.11) 


and investigate its properties as n grows to infinity. 
First we make use of the reflection theorem of the factorial 
function.! 


1 sin my 
—— = —! 5-15.12 
(yop - (2319:12) 
because of which the critical function becomes 
_ (—1)" sin naz [n(1 + 2z)]! [n(1 — 2)]! 
Pensi(2) = a — Qn+D! +1)! (5-15.13) 


1 Cf. {15}, p. 239. 


356 Data Analysis Chap. V 


We agree that the complex variable z shall stay within the positive 
half plane. Then we can effectively approximate the factorial 
function by Stirling’s formula. 


n! = V 2a nrtW2e-n (5-15.14) 


This formula becomes arbitrarily accurate as n grows to infinity, 
but it is remarkably accurate even in the realm of small n. 
In terms of this approximation our function becomes 


ZD" n = fC | 14+2(| — z)1-2 
Ponsil2) = a vn VET es 


n 


2n + 1 E 
sin nmz 
The last factor requires closer attention. 
l , | E were tag 
sin n(x + iy) = 5; (e NTYENTE — ete mee) (5-15.16) 
i 


Let us assume that y is positive. Then the first term becomes 
negligible as n grows to infinity, while the second term can be 
combined with the previous factor to one single expression, raised 
to the power n. 

For our purposes it suffices to investigate the absolute value of 
this expression. 


A(z) = 4| 0 + JA — 2) | e” (5-15.17) 


The nth power of this number, as n grows to infinity, behaves in an 
extreme fashion. If A(z) is less than 1, the nth power goes to zero, if 
greater than 1, to infinity. Correspondingly the integrand of the 
remainder (10) converges in the first case to infinity, in the second 
case to zero. The boundary A(z) = 1 is characterized by a closed 
ellipse-like curve (see the accompanying drawing), determined by the 
transcendental equation 


4(1 + x) log [(1 + x}? + 43°] + 3(1 — 2) log [(1 — 2)* + 97] 


+aly|—y( are tan 7 4 arc tan 4 -) = 2 log 2 


+2 l — 


§15 Convergence of Equidistant Polynomial Interpolation 357 


The following table tabulates y(x) in intervals of 0.1 of the in- 
dependent variable. The neighborhood of the singular point x = 1, 
y = 0, is tabulated in intervals of 0.01. 


s | y xe | y 2a | y 
0.0 0.5255 0.6 0.3855 0.95 | 0.0963 
0.1 0.5219 0.7 0.3283 0.96 | 0.0813 
0.2 0.5110 0.8 0.2556 0.97 | 0.0652 
0.3 0.4925 0.9 0.1598 0.98 | 0.0475 
0.4 0.4660 1.0 0.0 0.99 | 0.0273 
0.5 0.4307 1.00} 0.0 


For any closed path which lies completely outside of this trans- 
cendental curve C, the integral on the right side of (10) converges to 
zero as n goes to infinity. This means that the interpolation con- 
verges to the true f(x) throughout the interval of interpolation. 


However, such a closed path (9) can be chosen only if the interior 
of C is free of any singularity of the analytical function f(z). If f(z) 
has one or more singular points within the enclosed region, the 
remainder goes to infinity instead of zero and the interpolation. 
cannot remain everywhere convergent. 

We have thus found the necessary and sufficient conditions for the 
convergent or divergent behavior of equidistant interpolation. Runge 


358 Data Analysis Chap. V 


demonstrated the divergent character of equidistant polynomial 
interpolation with the help of the example 


] 


ET 


The singularity occurs here at the points 


which are clearly inside the critical region. 


16. Orthogonal function systems. A much deeper insight into the 
nature of interpolation problems came from a different field, 
developed during the nineteenth century, but fully understood in all 
its fundamental implications in more recent times. It was an 
ingenious geometrical interpretation of the method of least squares 
which opened a new and tremendously fertile field of research. In 
this geometrical picture the concept of a “function” is translated into 
the concept of a “vector,” placed in a space of infinitely many 
dimensions. 

We will assume that a certain continuous domain of the variable x 
is given, limited by x = a and x = b (domains of more than one 
dimension, however, can be treated in an entirely similar way). In 
this domain we consider a function f(x) which shall be conceived as 
everywhere finite and single-valued, and at least sectionally con- 
tinuous. Such a function, although it comprises an infinity of 
values, can nevertheless be tabulated in the given interval with any 
degree of accuracy. Instead of giving all the values of f (x), we record 
its values in the everywhere dense set of discrete points 2, £a, *** , Xp. 
The sectionally continuous character of f(x) makes it possible that 
the discrete set of values 


%=f(%), Ya =f) |, Yn=f (ln) (5-16.1) 


comes arbitrarily near to any value of f(x) and that this discrete set 
can replace the original f(x) for all analytical operations. Although 
a certain error is committed by this replacement, we can make the 
error as small as we wish, by making n sufficiently large. 

Let us now assume that we plot the functional values as rectangular 


§ 16 Orthogonal Function Systems 359 


coordinates of an imaginary Euclidian space of n dimensions. More 
specifically we want to plot along the successive coordinate axes the 
values 


fA =YV en fo=yeVex > fa=Yn VEn  (5-16.2) 
where 


Ey = k — Tk- (% = a) 


All the £, converge to zero as n increases to infinity. 
We now have a definite point 


P= (A Ses gj sfn) 


of an n-dimensional space, representing the given function f(x). We 
can also say that we have constructed the vector OP whose projec- 
tions on the coordinate axes are proportional to the given functional 
values. By going with n higher and higher we will obtain an increas- 
ingly adequate representation of any finite, single-valued, and 
sectionally continuous function of the given interval. The analytical 
concept of a function can thus be dropped in favor of the more vivid 
picture of a vector in a many-dimensional space. The length-square 
of this vector is given by the sum 


Joe Sy i= 5 Yie; (5-16.3) 
t=1 i=l 


and we notice that in the limit, as n goes to infinity, this sum is 
replaceable by the integral 


i =| f 2(x) dx (5-16.4) 


If two functions f(x) and g(x) are considered, they represent two 
vectors, whose mutual orientation can be characterized by their 
“scalar product” 


fg= > fa ->y (x,)e(@,)e; (5-16.5) 


360 Data Analysis Chap. V 


which again becomes an integral as n grows to infinity: 


b 
f= f J («)g(@) dx (5-16.6) 


In this picture the problem of approximating a function by a given 
set of functions—such as the powers of x, for example, in interpola- 
tion problems—appears likewise in new light. A given set of func- 
tions represents a set of vectors which can be conceived as a given 
frame of reference within our imaginary space. It is in fact a partial 
frame only if the number of approximating functions is finite while 
the number of dimensions goes to infinity, The problem of approxi- 
mating a function as a linear combination of given functions u,(z) 
can be conceived as the geometrical problem of analyzing a vector 
in a given frame of reference. But then it is evident that we will 
prefer as particularly convenient the orthogonal frames of reference, 
characterized by the fact that any two of the base vectors u; are 
orthogonal to each other: 


b 
u,;U;, = Í u,(x)u,(x) dx = 0 (5-16.7) 
while the length of any one of the vectors is normalized to 1: 
b 
ue = [ u(x) dx = 1 (5-16.8) 


Functions which satisfy these conditions are called “orthonormal.” 
The matter of normalization is of smaller importance. The condition 
(7) alone defines the orthogonality of a given set of functions. In such 
an orthogonal frame of reference the problem of analyzing a given 


vector is immediately solvable. The mere projection of the vector v 
on the orthogonal axes gives the components of v in that particular 
frame of reference: 


—> — — — 


where 


§ 16 Orthogonal Function Systems 361 


This means that a function f (x), analyzed in the reference system of the 
u(x), appears in the form 


(x) = > cux) (5-16.9) 
fe) = 2 
where 
f euas 
E en (5-16.10) 
ll u(x) dx 


This requires, however, that f (x) shall lie inside the space included by 
the m functions u(x), + , u,,(x). If f(x) lies partly outside that space, 
we obtain by the above construction a function which can be con- 
ceived as an effective approximation of f (x), in terms of the functions 
u(x). This approximation is the projection of f (x) into the subspace 
of the base vectors u(x), * , u,,(z). This projection can be con- 
sidered as that particular linear combination of the given vectors 
u;(x), which comes nearest to the given f(x), inasmuch as the error 
of the approximation: 


NX) = fE) — fin) (5-16.11) 


has the smallest possible length. 

It is conceivable that we may possess function systems which 
constantly satisfy the orthogonality condition (7) and yet newer and 
newer functions can be added to the previous set, without end. Such 
an infinite set of functions may include the entire function space. 
This means that if f(x) is an arbitrary finite, single-valued, and 
sectionally continuous function of the given interval—and we will 
add the condition that f(x) shall not have an infinite number of 
maxima or minima in that interval—we can form approximations of 
increasingly high order and, although it will never happen that we 
obtain exactly 


n 


f(x) = 2 c;u;(x) 


. 


¿=1 


362 Data Analysis Chap. V 


no matter how far we go with n, yet we may obtain in the limit: 


f@) = lim > cue) (5-16.12) 
vee j=1 


Such function systems are called “complete orthogonal function 
systems.” Expansions of the form (12), with coefficients determined 
according to (10), are called “orthogonal expansions.” They play 
a superior role in the physical and mathematical problems of our 
days. They are not restricted to one single variable but exist equally 
in any number of variables. Nor is the domain of expansion neces- 
sarily finite. With the proper precautions even infinite domains find 
their place within the function space. 

The Fourier functions, sin kz, cos kx, connected with the range 
[—7,-+77], were the first example of a complete orthogonal function 
system. At the time of their discovery the concept of the function 
space and the wider implications of orthogonality were not yet 
recognized. Today we know that the Fourier functions represent just 
one particularly interesting orthogonal frame of reference within the 
function space. But there are infinitely many other such frames, 
obtainable by a rigid rotation of the original axes, which all share the 
spectacular properties of the Fourier functions in approximating 
highly capricious functions. All these function systems are analytically 
equivalent, in the sense that all orthogonal reference systems of an 
n-dimensional Euclidian space share the same metrical properties, 
because of the homogeneity of space in every direction. 


17. Self-adjoint differential operators. We have seen in Chapter II 
that every symmetric matrix with noncollapsing eigenvalues estab- 
lishes an orthogonal frame of reference by its principal axes which are 
always present in sufficient number. In function space a corre- 
spondingly abundant source of complete orthogonal function 
systems exists through the medium of a certain class of linear 
differential operators called “‘self-adjoint.” 

The important matrix identity 


y- Ax — x Ay=0 (5-17.1) 


has a counterpart in the theory of linear differential operators. Let 
D be an arbitrary linear differential operator, ordinary or partial. 


§ 17 Self-adjoint Differential Operators 363 


Then there exists a uniquely determined operator D obtainable by 
purely algebraic and differential operations, which has the following 
property: 
a. 0 0 
v Du — u Dv = a +e + di (5-17.2) 
Ox, Ox n 

Then, integrating over an arbitrary closed volume of the variables 
Xis Ta , Ep, and transforming on the right side the volume integral 
into a surface integral by the Gaussian theorem, we obtain a relation 
called “Green’s identity” which is the counterpart of (1): 


Í (v Du — u Dv) dr = surface integral (5-17.3) 


So far the functions u and v are completely arbitrary—except for 
their differentiability to the extent demanded by the operators D and 
D. We now assume that the function u(x) is subjected to some more 
or less stringent “boundary conditions” on the boundary surface. 
Then we can always prescribe some properly chosen boundary 
conditions for v(x), called the “adjoint boundary conditions,” which 
will make the right side of (3) vanish: 


Í (v Du — u Dv) dr = 0 (5-17.4) 


In the interior of the region of integration the functions u and v 
are still arbitrary. 

Now it may happen that D = D, in which case we speak of a 
“self-adjoint differential operator.” Moreover, we may have 
subjected u to such boundary conditions that the boundary condi- 
tions for v become identical with those for u. We then have a “‘self- 
adjoint differential operator with self-adjoint boundary conditions.” 
Such an operator, conceived as an operator in function space, is a 
counterpart of a symmetric matrix. Its principal axes define a 
complete orthogonal function system. These principal axes are 
defined by the differential equation 


Du = hu (5-17.5) 


together with the prescribed boundary conditions. 
Equation (5) is solvable only for a definite set of A-values, called 
_ the eigenvalues of D. However, since the function space has infinitely 


364 Data Analysis Chap. V 


many dimensions, the eigenvalues are present in infinite number. They 
are always real, however, and always discrete, if the differential 
operator D is free of any singularities and the domain of integration 
is finite. They can be arranged according to their magnitude, 
starting with the absolutely smallest 4 = A, and continuing to larger 
and larger Å, As, °° 

The orthogonality of two solutions, belonging to two different 
eigenvalues A, and A,, follows from (4) if we substitute for u and v 
the two solutions u, and u,: 


O27) Í uu, dr = 0 (5-17.6) 


Since the first factor cannot vanish if A, and A, are different, the 
second factor must vanish, expressing the orthogonality of the 
obtained functions with respect to the realm of integration. In the 
case of multiple eigenvalues the associated eigensolutions are not 
automatically orthogonal to each other (although they are ortho- 
gonal to all the other u, of the set) but we can orthogonalize 
them by choosing the proper linear combinations; (cf. II, 9). 


18. The Sturm-Liouville differential equation. Although an infinite 
variety of self-adjoint differential operators can be constructed, 
together with an appropriate set of boundary conditions, the most 
important orthogonal function systems of applied analysis arise 
from differential operators of second order. This is the lowest order 
for self-adjoint differential operators, since differential operators of 
first order, (at least with real coefficients) cannot be self-adjoint. We 
will restrict ourselves to one single variable and thus deal with 
ordinary differential operators only. The most general ordinary 
linear and self-adjoint differential operator of second order—first 
investigated by Sturm and by Liouville and thus frequently named 
after them—has the following form: 


F , 
Dy = PT (py’) + qy | (5-18.1) 


The functions p(x) and q(x) are still arbitrary, although p(x) has to 
be differentiable and we usually assume that it does not change its 
sign throughout the given interval. 


§ 18 The Sturm-Liowville Differential Equation 365 


Green’s identity (17.3) associated with this differential operator 
becomes 


jeleo taj- peteje an 
= | pou’ — wo") 


Any boundary condition is permitted which, if equally applied to u 
and v, makes the boundary term vanish, e.g., 


u(a) = u(b) = 0 
or 
u'(a) = u'(b) = 0 
As a simple example let us choose a = —7,b = m, p = —1,q=0. 


The boundary conditions shall be chosen as 
u(—m) = u(r),  uw'(—r) = u (7) 

The eigenvalues and eigenfunctions are defined by the differential 
equation 
which is solved by 

u = AÁ cos V Ax + Bsin V Ax 
The boundary conditions determine 4 to 

=k? (k=0,1,2,=) 


and the arbitrariness of A and B shows that every eigenvalue belongs 
to two functions. We can separate them by the choice 


U, = cos kz, ü, = sin kx 


These are the orthogonal functions of the Fourier series, here 
obtained as the solution of a simple Sturm-Liouville problem. 

Let us extend, however, our considerations to the eigenvalue 
problem associated with the most general linear differential operator 
of second order: 


A(x)ul(x) + Bix)u'(x) + (C(x) + Alu) =0  (5-18.3) 


366 Data Analysis Chap. V 


We can multiply this equation by a certain p(x), chosen in such a way 
that the operator (1) shall be obtained again. This demands 


p(x) B(x) = [PAE 


which gives the condition 


a (5-18.4) 
p 


We can now put 
P(x) = p(a)A(a) (5-18.5) 


and write our equation in the self-adjoint form 


d 
5, ou’ @)1 + PIC) + Au2)=0 — (5-18.6) 
If we apply Green’s identity, the boundary term becomes again 
paou — uv’) [ (5-18.7) 


and again we assume that this term vanishes on account of the 
boundary conditions. The only difference compared with the pre- 
vious case is that the orthogonality of two solutions belonging to A, 
and A, appears in the form 


Í i p(x)u,(x)u,(x) dx = 0 (5-18.8) 


An orthogonality condition of this type is called “weighted orthogo- 
nality,” since the function p(x)—which must remain everywhere 
positive inside the interval of integration—can be interpreted as a 
weight factor. Function systems associated with weighted ortho- 
gonality are not essentially different from ordinary orthogonal 
functions, as we can see if a new independent variable £ is introduced 
by the condition 

dÈ = p(x) dx (5-18.9) 


The weighted orthogonality in x is now changed to ordinary 
orthogonality in £. 

= Weighted orthogonality is particularly useful if the range of 

integration becomes infinite in one or both directions. A properly 


§ 20 The Jacobi Polynomials 367 


chosen p(x) may prevent the integrals (8) from becoming divergent on 
account of the infinite limits. By this artifice the validity of function 
space operations can be extended to an infinite interval. 


19. The hypergeometric series. One of the many far-sighted 
discoveries of Gauss was the introduction of an infinite series, called 
the “hypergeometric series.” It defines a function of x which at the 
same time depends on three constants «, 8, y, which can assume 
arbitrary real or complex values, except that y must not be a negative 
integer. The hypergeometric function, usually denoted by F(«, p, y;2), 
is defined by the following infinite series which converges for all 
|æ |< 1, and diverges for all | x| > 1, except if it so happens that the 
infinite series terminates after a finite number of terms: 


ae ap a(a + DEE + 1) 
F(a, b, y; x) = 1 tai FDT 1): 7s) x2 


(5-19.1) 

ala + 1a + DEE + DE + 2) 
vy + Dy +2):1:2-3 

Almost all the special functions of mathematical physics—with 

the exception of the gamma function—are in some relation to the 
hypergeometric series, which includes a large class of functions. 

The hypergeometric function satisfies the following differential 

equation of second order, called the differential equation of Gauss: 


a(l — xju” + [y — (x + P + 1)xju' — afu =0  (5-19.2) 


This differential equation has the general form (18.3) of a linear 
differential equation of second order, with 


In order to transform it into the self-adjoint form (18.6) we have to 
obtain p(x) by the condition (18.4), which gives 


p(x) = 27-1 — x)” +8- (5-19.3) 


ek gS 1 + 


and thus 
P(x) = w"(1 — gy tP+ti-y (5-19.4) 


20. The Jacobi polynomials. An eigenvalue problem which shall 
lead to an orthogonal set of functions requires a self-adjoint differen- 
tial operator with self-adjoint boundary conditions. In the previous 


368 Data Analysis Chap. V 


section we have introduced the weight factor which made the 
Gaussian differential equation self-adjoint. Now we will investigate 
the question of boundary conditions. The realm of integration shall 
be limited to the interval [0,1]. Then the boundary term becomes, 
according to (18.7): 


a’(1 — x)*(vu’ — uv’) : (5-20.1) 
where we have put 
a+P+i—-y=6 (5-20.2) 
We will assume that y and ô are given positive constants: 
y>0, <d>0 (5-20.3) 


In this case the boundary term (1) seems to vanish all by itself and it 
seems that we do not get any boundary conditions for u(x). In 
actual fact the points x = 0 and x = 1 are singular points of our 
differential equation at which the solution goes generally to infinity. 
The demand that u(x) shall remain finite at these two points, is in 
itself a boundary condition at the two endpoints of the interval. 
The finiteness at x = 0 ruled out already one of the solutions of the 
Gaussian differential equation and reduced our problem to the 
hypergeometric series. Now the condition ‘“‘finiteness at x = 1” 
demands additional restrictions. 

Let us observe that for any given y and ô we still have either « 
or B freely at our disposal. Now the hypergeometric series (19.1) has 
the remarkable property that it automatically terminates with the 
power x” if « is chosen as the negative integer —n. Then 


a= —N, B=n+y+o6-1 (5-20.4) 


and we obtain as the solution of our eigenvalue problem the poly- 
nomials 
Pye) = F(—n, n+ y +6—1, y; 2) (5-20.5) 


called “Jacobi polynomials.” They have the remarkable property 
that they are orthogonal with respect to the weight factor (19.3): 


p(x) = 2”? "(1 — x)? (5-20.6) 


establishing a complete orthogonal function system for any choice 
of y and 6 which is in harmony with the condition (3). For 


§ 20 The Jacobi Polynomials 369 


applications certain special choices of y and 6 are of particular 
interest. 
The ultraspherical polynomials. The choice 


y= 6 (5-20.7) 


leads to symmetric weighting with respect to the mid-point x = 4 of 
the interval. In this case it is frequently preferable to put the origin 
of the reference system into the mid-point of the range by the 
transformation 


pees (5-20.8) 
2 
The new variable ¢ runs between —1 and +1. The polynomials 
thus obtained, called ‘‘ultraspherical,” are now alternatively even 
and odd polynomials, according to the even or odd character of n. 
If the new variable is again called x, we obtain the definition 


i= 
POY£) = F (—», n+2Qy—1, y; =) (5-20.9) 


The weight factor of orthogonality now becomes 
p(x) = (1 — x? ~? (5-20.10) 


Of particular interest are the following special cases: 

(a) The Legendre polynomials: y = 1. Here the weight factor 
becomes I and weighted orthogonality changes to ordinary ortho- 
gonality: 


P(x) = F(—n, n+ 1,1; 5 =) (5-20.11) 
We will encounter these important polynomials later on, in quad- 
rature problems (cf. VI, 10 and 19). 
(b) The Chebyshev polynomials: y = 4. Here the weight factor 
becomes (1 — z?y"2 and the transformation x = cos 0 removes 
weighting. Then the polynomial 


T(x) = F(—n, aie 5 z) (5-20.12) 


is transformed into cos nð and we obtain the Fourier cosine 
functions. These polynomials have the widest field of applications 


370 Data Analysis Chap. V 


(see Chapter VII) and have the greatest efficiency in approximating 
arbitrary functions. 
(c) The Chebyshev polynomials of the second kind: y = 3: 


U(x) = (n+ 1)F (—», n -+- 2, : : 1—5) (5-20.13) 
_sin(@a+1)6 .. 
row ame (if x = cos 6) 


They are well suited for the polynomial representation of a function 
which assumes large values in the neighbourhood of x = +1 and 
remains small everywhere else (cf. III, 7, see also IV, 28). 

(d) The case y = œ. This case is of interest on account of its 
relation to the Taylor series which can thus be conceived as the limit 
of an orthogonal expansion. Moreover, if x is simultaneously 
changed to 


£= Vyr 


the interval of € is extended from —oo to +0 and the weight 
factor (10) becomes e~*. The resulting polynomials are called 
“Hermitian” : 

T 


ACE im v3y"F|—n, psy: ; ( = 7) | (5-20.14) 


The Laguerre polynomials. Among the Jacobi polynomials of 
unsymmetric weighting (y + ô) the case 6 —> œ is of special interest 
This corresponds to letting p go to infinity. But then the expansion 
(19.1) of the hypergeometric series shows that we can introduce 
é = Px as a new variable and obtain in the limit, as 8 goes to infinity, 
the function 


a(a+ 1) 


a aes, eee 5-20.15 
va ed 


a 
pla, y; ye T 


which converges for all (real or complex) values of $. From the 
standpoint of orthogonality the range of ¢ is now [0, co] and the 
weight factor (6) becomes 


p(é) = Eet 


§ 2] Interpolation by Orthogonal Polynomials 371 
The choice y = 1 yields the Laguerre polynomials: 
L(x) = n!d(—n, 1; x) (5-20, 16) 


They are orthogonal with respect to the weight factor e~*, in the 
infinite interval [0, 00]. We have encountered this class of poly- 
nomials in the inversion problem of the Laplace transform, in 
connection with the transient response of an electric network; 
(cf. IV, 30). l 


21. Interpolation by orthogonal polynomials. In §§ 14 and 15 we 
have discussed the dangers of equidistant polynomial interpolation. 
We have seen that the error of an arbitrary interpolation by powers 
depends on the ratio of two polynomials: 


F,(2) 


F (5-21.1) 


Q(x, 2) = 


Here F,,(x) is the fundamental polynomial, formed out of the root 
factors x — z, if x, are the points of interpolation. The point x is 
some point of the interval [—1, +1], while z is some point of the 
complex plane. If the ratio (1) approaches zero with n growing to 
infinity, the convergence of the interpolation is assured for all 
a values. But in the case of equidistant interpolation only such 
z values could be admitted which stayed outside of a certain oval- 
shaped region surrounding the interval of interpolation. This meant 
that not only had f (x) to be analytical in the given interval, but this 
analytical behavior had to be demanded everywhere within the 
oval-shaped domain. 

Very different is the behavior of orthogonal polynomials. We 
can expand an arbitrary f (x), which satisfies much less than analytical 
conditions within the interval of interpolation and need not even be 
defined outside the interval, into a complete orthogonal set of 
functions, according to the equation (16.12). However, the 
coefficients c, of this expansion demand the evaluation of the 
definite integrals (16.10) which are in actual fact but seldom at our 
disposal. Hence it is of great practical advantage that we can obtain 
an equivalent expansion with modified coefficients c; which also 
converges to f (x) with increasing n, and whose accuracy even for 
finite n is not essentially worse than the expansion formed with the 


372 Data Analysis Chap. V 


help of the c, coefficients. This expansion is explicitly at our disposal 
on the basis of an interpolation procedure, without any integrations. 
We can link up this procedure with the “‘classical’’ series 


ce 


f@) => cula) (5-21.2) 


1=1 


by truncating the series to terms and considering the remainder of 
this expansion: 


co 


In) =S- f= D> cule) (6-2.3) 


t=n+1 


If we assume that this series has quick convergence, we may estimate 
the remainder by keeping only the first term. In this case 


PAC) = Cnt ny 1(2) (5-21 4) 


and this means that the error is zero at the roots of the first neglected 
orthogonal function. But this again means that we shall obtain the 
coefficients of the finite expansion 


n 


fale) = > iua) (5-21.5) 


i=1 


by fitting the functional values f (x) at the zeros of the first neglected 
orthogonal function u,,,,(z) (provided that n such points can be 
found inside the realm of orthogonality): 


> cu,(A,) = f (Ai), UntalAr) a 0, (k = l, 2, T, n) (5-21.6) 


t=1 


This gives a simultaneous system of n linear equations for the 
determination of the c;. Although these coefficients will generally 
not coincide with the classical coefficients (16.10), obtained by 
integration, yet the error of the approximation will not be essentially 
worse, while the numerical procedure is now simple and straight- 
forward. 

The price we have to pay for convergence is that the key values of 
f(x) have to be given at certain prescribed nonequidistant points 


§ 21 Interpolation by Orthogonal Polynomials 373 


x = Àp, defined as the zeros of the first neglected orthogonal poly- 
nomial. The fundamental polynomial in (1) has to be replaced by 
the nth orthogonal polynomial p,(x) [the enumeration of these 
polynomials starts with n = 0 and thus u,,,(x) is actually p,(«)]. 
The greatly increased convergence is demonstrated if we form the 
ratio (1). No longer is z confined to an oval-shaped region around 
the x-axis. We can choose for z any complex value x + iy, arbi- 
trarily near to the x-axis, and yet Q,,(z, z) converges to zero. 

We can prove this statement explicitly in an elementary way for 
the special case of the Chebyshev polynomials. Here 


P(x) = cos n0 
if x is transformed into 0 according to 
x = cos 6 


Now for any complex value of x, in fact for any x outside the range 
+1, the angle 0 must become complex. But then we can put 


0 = p + iy 


cos nO = (eP? +. emp +n) 


and this quantity tends to infinity with increasing n for any p 4 0. 

Hence we see that the analytical nature of f (x) outside the given 
interval is no longer demanded. In actual fact the interpolation 
converges to f (x) for any x of the given interval, without demanding 
analyticity for f (x) even in the given interval, as long as f (x) belongs 
to the class of functions of “bounded variation.” 

The interpolation of functions with the help of orthogonal poly- 
nomials (not necessarily of the Jacobi type) has some further 
properties which greatly facilitate the numerical procedure. One of 


* That p,(x) must have n zeros in the interval of orthogonality can easily be 
demonstrated. For, assuming that this is not the case, we would have 


Pal®) = (@ — Ay) (@ — Amga) (m < nn) 
where q(x) does not change its sign between a and b. But then 


b 
l a 
cannot vanish since the integrand does not change its sign anywhere. And yet, 


p(x) being orthogonal to any polynomial of lower order (cf. text, later), the 
integral should be zero. This contradiction establishes the theorem. 


374 Data Analysis | Chap. V 


the remarkable properties of orthogonal polynomials is that they 
satisfy a recurrence relation which connects three consecutive 
polynomials. This recurrence relation is of the following general 
form: 


Pasi) = (Cy = An)P n(X) oo b,Pn-1(2) (5-21.7) 


The existence of such a relation is a direct consequence of the fact 
that an arbitrary p,(z) is orthogonal to all the previous polynomials 
and therefore also to any power 2%, « < n, since such power is 
merely a linear combination of the p,(x), of an order less than n. But 
then we can conclude that more generally p,(x) must be orthogonal 
to any polynomial of an order less than n. Let us denote the highest 
coefficient of p,(x), i.e., the coefficient of x”, by uw,. Then the 
difference 


Unit 
xp (2) 
u p 


n 


P nti() = 


eliminates the power x”+1 and what remains is a polynomial of the 
order n. This polynomial can certainly be obtained as a linear 
combination of p,(x), p(x), °°, Pæ): 


Pa) — = XP n(X) = YoPol®) + VPE) + °° + Y¥nPal%) (5-21.8) 
Now, multiplying on both sides by p(x)p,,(x), m < n — 1, we obtain 
in consequence of orthogonality: 


Ya [ p(%)pin(x) dx = — A Í l P(@)P n(@)XPp (x) dx 


But xp,,(x) is a polynomial of a degree less than n, to which p,(x) is 
orthogonal. Hence all y,,(m < n — 1) must drop out on the right 
side of (8) and the only non-zero coefficients are y, and y,_,. The 
existence of a “‘three-term recurrence relation” of the form (7) is thus 
demonstrated and c,,, is explicitly obtained: 


ča ela (5-21.9) 


n 


We will now assume that the orthogonal polynomials are 
normalized in length: 


Í i p(x)p2(x) dz = 1 (5-21.10) 


§ 21 Interpolation by Orthogonal Polynomials 375 


If p?(x) is written in the form 
Peu + °°) 


and the fact is taken into account that the dots represent a poly- 
nomial of not higher than (n — 1)st order, we obtain 


[Ly f PONET dx = 1 (5-21.11) 


Let us now multiply (7) by p,_,(x)p(«) and integrate on both 
sides, taking into account that zp,_,(z) gives m,_,x" plus a poly- 
nomial of not higher than (n — 1)st order: 


b 


bn = Cnty Mni Í a" (x) p(x) dx = Cn+1 


a ln 


ln- 


We have thus obtained an explicit expression even for b,„: 
b, = deta (5-21.12) 
ln 
and now we can write (7) in slightly different form, multiplying the 
equation by 4 ,/ Mpy: 
B n+i1P nyx) = (x ai A n)P n(x) Ea B nP n(x) (5-21.13) 
where we have put 


My _ Kri 


ay, = Ay > k 
Hk+ lk 


These recurrence relations can be conceived as a sequence of 
linear equations: 
(xo — 2)Yo + Pith = 0 
PiYo + (#2 — 2), + PY =0 (5-21.14) 
Bo¥o + (x3 — 2)¥2 + Bay = 0 | 


This never-ending sequence terminates, however, if we consider 
those particular values x = A, for which p,(x) vanishes: 


Yn = PAlA,) = 9, (k = l, 25 ve N) 


376 Data Analysis Chap. V 


We then have a homogeneous set of n linear equations whose 
determinant must vanish: 


ty —-A By 
By a — À Be 
Be a,—A Bs 
: =Q (5-21.15) 


Ên Xn- A 


We see that we obtain a regular eigenvalue problem of a symmetric 
matrix of the order n. The eigenvalues A = A, are the roots of the 
equation p,(A) = 0. The eigensolutions are the principal axes of a 
symmetric matrix which are automatically orthogonal to each other. 
These eigensolutions are 


u; = Poli), P(A), «+s Pnl 


Thus we obtain the orthogonality relations 


n—1 
> pala) Pd =0 GFK) (5-21.16) 


a=0 


We can make the p,{A,) matrix to a truly orthogonal matrix by 
normalizing the length of each column to 1. This means that we put 


Vi = PiP) 
where 
1 


(a 
5-21.17 
J > Ra) a 
a=0 


Since, however, an orthogonal matrix has the property that the 
orthogonality relations hold between the rows not less than between 
the columns, we obtain 


> PaP ÊP) =0 (Ak) (5-21.18) 


a=1 


§ 21 Interpolation by Orthogonal Polynomials 377 


This relation is a remarkable counterpart of the orthogonality 
condition 


b 
{ o(a)p apa) de =0 GK 


It shows that the orthogonal polynomials possess a second ortho- 
gonality property. They are orthogonal with respect to integration. 
But they are also orthogonal with respect to summation, if the key 
points are chosen as the zeros of the first neglected polynomial (in 
both cases the orthogonality is of the weighted type, but the two 
weight factors are quite different). 

This second orthogonality property greatly reduces the labor of 
interpolating with the help of orthogonal polynomials. Our aim was 
to solve the linear equations (6): 


Copal) + cipi(å) + «.. Cn—1P n—1(A) = f(A) 


aA ) + PAn) + o Cn—1P naan) =f An) 
The orthogonality condition (18), together with the normalization 
condition 


> pA) = 1 (5-21.19) 


allows us to solve the ines: set explicitly and obtain the coefficients 
c; in the form: 


ci = 2 Pal (A,)p(aa) (5-21.20) 


These equations can be conceived as natural counterpart of the 
determining equations (16.10) of the coefficients c;. The great 
advantage of the new coefficients is that they are numerically 
available by a simple summation process (the coefficients of 
which can be pretabulated), without any integration. 

Here again the Chebyshev polynomials are endowed with 
superior properties. For them the weight factors p? become all 
equal and thus weighted orthogonality becomes ordinary ortho- 
gonality. Moreover, the coefficients p,(A,,) are here easily available 
since they are the simple and well-tabulated trigonometric functions 
cos k@ at angles 0, which are easily accessible (cf. IV, 16). 

Other special cases of potential interest are the Legendre poly- 
nomials and the Laguerre polynomials. Although the zeros of 


378 Data Analysis Chap. V 


these polynomials, together with the weights p,, have been calculated,* 
an elaborate tabulation of the matrices p,(A;) is not available at the 
present time. 


1 The weights p? are in a remarkable relation to the weights of the Gaussian 
quadrature. Let the interpolation of f(x) by orthogonal polynomials serve the 
purpose of obtaining a parexic value of the definite integral 


b 
A= f p(x) f (x) dx 


(“Gaussian quadrature,” cf. VI, 10). Then the orthogonality of all p(x) to 
Po(%) = Po = const. shows that every term of the expansion (5), except the first 
vanishes in the process of integration and we obtain 


- , fe 
A = CoPo i‘ p(x) dx 


But then, in view of (20), 
b n 


A = pe f pe) de X pa (a) (5-21.21) 
a a= 


The factor in front of the summation sign is 1, considering the normalization of 
Po(“). On the other hand, the weights w, of the Gaussian quadrature 
n 


A= ps Wa f (Aq) 


a=1 
are tabulated. Comparison with (21) gives 


2 
Pet 


Bibliographical References 
[1] Cf. Ref. {4}, Chapters IV and V; [2] Cf. Ref. {8}, Chapters VI 
and IX. 


[3] Fort, T., Finite Differences and Difference Equations (Oxford 
University Press, New York, 1948). 

[4] MiILNE-THOMPSON, Calculus of Finite Differences (Macmillan, 
London, 1933). 

[5] STEFFENSON, J. F., Interpolation (Williams & Wilkins, Baltimore, 
1927). 

[6] WHITTAKER, E. T., and RoBINsON, G., A Short Course in 
Interpolation (Blackie & Son, London, 1923). 

Article 
[7] SCHOENBERG, I. J., “Some Analytical Aspects of the Problem of 


Smoothing” (Courant Anniversary Volume (Interscience 
Publishers, New York, 1948). 


Vi 


QUADRATURE METHODS 


1. Historical notes. The problem of areas challenged the imagina- 
tion of scientific thinking from earliest dates. The “‘problem of 
Dido,” usually formulated as the problem of enclosing a maximum 
area by a chain of given length, originated in prehistoric times. The 
determination of areas of more or less complicated shape played an 
important part in agricultural civilizations, and both the old 
Babylonians and the old Egyptians were acquainted with a variety of 
formulas, expressed, of course, in verbal rather than algebraic form, 
by which areas included by straight lines could be evaluated. The 
Greeks with their advanced knowledge of geometry went into much 
greater details. The area of a circle offered a particularly challenging 
problem and led to establishment of the rigorous methods of limit 
theory, called in ancient times the “method of exhaustion.” True 
integrations were performed by Archimedes, who used the method of 
inscribed and circumscribed polygons, thus obtaining upper and 
lower bounds which approached each other indefinitely. With the 
advent of infinitesimal calculus many areas could be evaluated by the 
discovery that integration and differentiation are inverse processes. 
Moreover, the simple trapezoidal procedure of the ancients became 
refined by using polynomials of second and higher orders for inter- 
polation of equidistant data. An eminently useful formula, based on 
parabolas of second order, was introduced by the English mathe- 
matician Th. Simpson (in 1743) and is usually quoted as “‘Simpson’s 
rule.” Later Gauss (1814) invented a particularly ingenious and 
important method for obtaining areas, based on the properties of the 
Legendre polynomials. 

The Swedish mathematician H. Fredholm introduced in 1900 his 
“integral equations” which in later decades became of fundamental 

379 


380 Quadrature Methods Chap. VI 


importance in the solution of boundary-value problems, eigenvalue 
problems, and many problems of advanced statistics. If the integrals 
here encountered are evaluated by the simple trapezoidal rule, an 
integral equation becomes replaceable by a large system of ordinary 
linear algebraic equations. But frequently more convenient solutions 
are available if more advanced quadrature methods are applied to the 
definite integral which characterizes the left side of an integral 
equation. | 


2. Quadrature by planimeters. The problem of determining the 
area under a given curve is frequently referred to as “mechanical 
quadrature,” although the implication does not mean that a 
mechanical instrument shall be used for the solution of the problem. 
In actual fact, however, mechanical instruments do exist which 
perform the quadrature operation by mechanical or electrical means. 
They are called “‘planimeters”’ if their basic principle is that we trace 
by a pointer the circumference of the area to be evaluated. The 
reading of the instrument gives the area directly in square 
inches. 

Carefully constructed planimeters are rather expensive instruments 
and their accuracy is limited. They require careful drawing and 
tracing of the contour of the unknown area. Frequently a simple 
calculation can give more accurate results than the planimeter, 
particularly if an empirical curve is not observed as a continuous 
curve but as a sequence of discrete ordinates. The calculation 
utilizes the observed ordinates only and does not require that the 
points of observation shall be interpolated by a more or less arbitrary 
graphical procedure. 


3. The trapezoidal rule. The oldest method of approximating the 
area under a continuous curve is the method of inscribed polygons, 
known today as the “‘trapezoidal rule.” We connect the observed 
ordinates by straight lines and replace the area under the curve by 
the area under the polygon. If the observed ordinates 


y = Yo: Yi» Yos ws Yn (6-3.1) 
belong to the abscissa values 


a CE eR E (6-3.2) 


§ 4 Simpson’s Rule 381 


the elementary formula for the area of a trapezoid gives the following 
result for the area under the polygon. 


A = Yo + YDE — %) + + Una t+ Yn — na) (6-3.3) 
= biyo — %) + nE — Xo) + Yol%g — 21) + | + 
Y n-En — En) F Yn(En — Tri)] 
For equidistant spacing, 
Ty — To = Tg — Ty = Tz — Tg = | = Ep — A, =h (6-3.4) 
the formula (3) assumes the simpler form 


A = hikyg + 4% + Yo + + Yn- + Yn) (6-3.5) 


which means that, apart from the factor h, we simply add up all the 
observed ordinates, but apply the weight factor 4 to the two limiting 
ordinates. 

We know from the fundamental theorem of integral calculus that 
the area under the curve characterized by y = f(x) can be defined as 
the limit to which £ tends as h approaches zero. 


b 
A=[ fle) de = lim hy ++" + Yea thy) (63.6 


For theoretical purposes this limit process is quite satisfactory since 
for mathematically given functions we can frequently obtain-the 
limit by analytical tools. This was the procedure of Archimedes in 
evaluating the center of masses and center of buoyancies of many 
complicated figures. 

From the viewpoint of practical computation the trapezoidal rule 
gives quite satisfactory results if we possess a sufficient number of 
ordinates. The calculation is extremely simple since straightforward 
addition of a given set of ordinates on the comptometer is a simple 
and quick process, Very often, however, it is cumbersome to 
ascertain the large collection of ordinates which are demanded by the 
trapezoidal method for a sufficiently close approximation. 


4. Simpson’s rule. Straight lines are too rigid for a really satis- 
factory approximation of curves. If we want to imitate a curve by 
1 We use the notation “bar” consistently to indicate an “approximation.” 


Hence A is an approximation of the true area A; similarly 7 is an approximate 
value of 7. 


382 Quadrature Methods Chap. VI 


drawing a succession of straight lines, we need a great many small 
lines for this purpose. We can obviously fare much better if we use 
parabolas of second order for the approximation. With the help of 
such parabolas we can approximate a curve to a remarkable degree 
without changing the approximating parabolas too often. A large 
number of very short straight sections is thus replaced by a small 
number of much longer parabolic sections. 

Hence we will increase the accuracy of our quadrature formula very 
considerably if, instead of connecting two consecutive ordinates by a 
straight line, we connect three consecutive ordinates by a parabola. 
We can always find a parabola of the form 


y = a + bx + cx? (6-4.1) 


I = Approximation by straight lines 


Il=Approximation by parabolic arcs 


which will fit three given points. By this method we come much 
closer to the actual curve than by mere straight lines. The error 
committed is thus much smaller. 


y =F (x) 


The resulting procedure is known as Simpson’s method, and the 
resulting formula is called Simpson’s rule. We divide the total area 
into an even number of equal panels, thus reading an odd number of 


§ 4 Simpson’s Rule 383 


ordinates because the two end ordinates are included in our reading. 
For the sake of convenience the width of each panel is chosen as 1. 
In each double panel we approximate the curve by a parabola of 
second order. 

Let us consider for example the first double panel, composed of the 
three ordinates Yọ, ¥,, Y2 We can expand y = f (x) around the point 
x = l into a local power series, making use of the method of central 
differences (cf. V, 3). According to Stirling’s formula we have 


fa+9=s0) +6 fay + Oo (6-4.2) 
Here 
fM=%H% 
f) = HG, — wo) (6-4.3) 


67f(1) = Yz — 24, + Yo 


Here we approximated the curve between x = 0 and x= 2 by a 
parabola of second order which coincides with the actual curve at the 
three points of interpolation x = 0, 1, 2. 

The area under the approximating parabola can be obtained by 
integrating between the points x = 0 and x = 2. 


An = Í "A+ tat (6-4.4) 
=i 


t öf) e |t 
= [yen + of) 2+ 270) A 
= 2f) + 3 ôf) 
2y + 3(Y2 — 241 + Yo) 
= $Y + 341 + 342 
We repeat exactly the same process for the areas A,,, A46, +" until the 
total area is exhausted. We then get 
A = hy) + $y + 442 (6-4.5) 
+ 3Y2 + 3Y3 + "i 
+ . a œ . ° 
= $(3Yo + Ye t+ Ya to + Yen] 
+ Sla + Ys + Ys + + Yenal 


384 Quadrature Methods Chap. VI 


This is Simpson’s formula. We separate even and odd ordinates and 
apply the even and odd ordinates with different weights, instead of 
the same weights as we have done in the earlier trapezoidal method. 
This discrimination of weights greatly increases the accuracy of the 
result. 

If the width of each panel is not 1 but A, we merely multiply by A, 
and the final formula becomes 


A = Shiba + Yo + + Yen + 2G, + Ya t+ + Yona) (6-4.6) 


The necessity of an even number of panels is sometimes an 
inconvenient limitation for the use of Simpson’s formula. We can 
avoid this difficulty by using a different construction for the Jast three 
(or first three) panels, if the number of panels happens to be odd. 
Consider the four ordinates Yọ, Y1, Yo, Y3 We put the origin of our 
reference system into the point x= 1.5 and approximate the 
function 

$[f1.5 + t) + f(1.5 — t)] 
by a parabola y =a + ct? 


since we know in advance that the area under the curve between 
t = +1.5 is influenced only by the even part of the function. The 
conditions at t = 0.5 and 1.5 determine the coefficients a and c: 


c = Hlo + Y3 — (Y1 + Y2)] 
a = ly + Y2) — (Yo + ¥s)] 


and we obtain for the area of the first three panels. 


+1.5 
y atte =22(a +$?) 
l ia (6-4.7) 
= g 3 + Yo) + Yo + Yal 


The corresponding formula for the width h of the panels becomes 


7 3h 
Ags = g [3C + Ye) + (Yo + Ya) (6-4.8) 


Hence if the number of panels is odd, we apply formula (8) to the 
first three, or last three, panels. Then the remaining number of 
panels is even, and here Simpson’s formula (6) comes into operation. 


§ 5 The Accuracy of Simpson’s Formula 385 


5. The accuracy of Simpson’s formula. We can estimate the 
accuracy of the parabolic approximation by the following con- 
sideration. If we integrate Stirling’s formula between the limits +1, 
all the terms with odd differences drop out, since the area under an odd 
function, taken between symmetric limits, is zero. Now, assuming 
that the sequence of ordinates is sufficiently dense, the Stirling 
expansion will be sufficiently convergent to estimate the truncation 
error by the first neglected term. The first term we have neglected in 
the integration (4.4) is 


d4f(1) (+? _  oFf(I) 
yee 12(t2 — 1) dt = aoe (6-5.1) 


The same consideration holds for every double panel, from k = 1 to 
k = 2n — 1. We thus obtain for the complete area 


1 


2n n-1 
f fd =å- z 2 64f (2k + 1) (6-5.2) 


This expression for the error is not very convenient, since it requires 
setting up an elaborate difference table. However, if f(x) is suffi- 
ciently smooth, the difference coefficient may be replaced by the 
derivative and the sum by an integral. Then the estimated error 7 of 
the quadrature appears in the form: 


- ht m Mt 
i= Fl ©- 0] (6-5.3) 


where a is the lower and b the upper limit of the quadrature, and the 
width of the panels is no longer 1 but h. (For a more general 
estimation of the error, not assuming smoothness, cf. 17.6.) 

More convenient and more reliable is another method of error 
estimation which will be discussed later (cf. § 12). If this method is 
applied to Simpson’s quadrature procedure, we obtain the following 
error estimate: 


Ah 
7 = 15 [Yo + Ya +o + Yan) Ta VA +yz +o + Yon-1)] 
(6-5.4) 


h? / /, 
- 5 6 — S'O] 


386 Quadrature Methods Chap. VI 


6. The accuracy of the trapezoidal rule. In order to estimate the 
accuracy of the trapezoidal rule, we make use of Bessel’s formula 
which interpolates on the half lines. 


fG +9) =070) + ogy + LO 


2 
Integrating between the limits t = +4, we obtain the area of one 
single panel. 


(?—2)+- (6-6.1) 


: +1/2 
Ay = ue f+ t)dt 
(6-6.2) 


+1/2 


2 fH E t 
DH G —£) p 


Adding up the area of every panel and neglecting higher order terms, 
we obtain 


n—i n—1 
A= > fkt- E > OMKTD 
Ph | =; (6-6.3) 
= A — D [of(n) — ôf (0)] 


The formula for the panel width h becomes 
- h 
A=4— D [ôf (nh) — df(0)] (6-6.4) 


and replacing again differences by derivatives we finally obtain the 
formula which corresponds to (5.3). 


h2 
IT [y (b) — y'(a)] (6-6.5) 


The comparison of (5.3) and (5) shows how much more accurate 
Simpson’s rule is than the trapezoidal rule. The second power of h 
is changed to the fourth power, while the numerical factor 12 in the 
denominator is changed to the much larger factor 180. On the other 
hand, the first derivative is changed to the third derivative, which is 
often much less smooth than the first derivative. 


7. The trapezoidal rule with end correction. We can perceive the 
significance of the trapezoidal formula from still a different view- 
point by introducing the Fourier series in our analysis. Let us assume 


§7 The Trapezoidal Rule with End Correction 387 


that f(x) is given in the range [0,1] and let us expand it into a 
Fourier cosine series. 


f(e) = ła + 4, cos nx + a, cos 27% + + (6-7.1) 


If we now substitute for x the values 


AAA 
n n 


eee 
? > 


and perform the summation 


r-i iross) (l) or 


n 
we obtain on the right side 
A = hy + azn + Og, +H °° (6-7.3) 


Now by the general definition of the Fourier coefficients, 
1 
a, = 2 Í f(x) cos kaa dx (6-7.4) 
0 


Hence 3a, is the true area A under the curve and we obtain 
A = A — (Gan + gy + 0) (6-7.5) 


We thus see that the trapezoidal rule will hold the better the more 
convergent the Fourier series (1) is. Now we know that the con- 
vergence of the Fourier series is decided by the analytical behavior 
of the function. The more continuous the function is in itself, and 
in its derivatives, the more convergent will the Fourier series be. In 
our present problem of a pure cosine series we have in the full range 
[—1,1] an even function which remains continuous at the boundary 
points, while the derivative will generally have a discontinuity at the 
points x = 0 and x = 1. The order of magnitude of the coefficients 
a, will be determined by this discontinuity. If in (4) we integrate by 
parts, we obtain 


. 1 1 
a, =2 one f (Œ) sin krz dx 
77/0 
(6-7.6) 
2/1. a oe ee 
= a? f'(x) cos krx gal? (x) cos krx dx 


388 Quadrature Methods Chap. VI 


Hence the coefficients of even order become 
l i ; 
On = zyr |y D- f o| +e (6-7.7) 


where £ becomes small in comparison with the first term if k is large. 
If it so happens, however, that the boundary condition 


f'O=f'0) (6-7.8) 
is satisfied, we can repeat the method of integrating by parts once 
more and obtain 


l 
a, = — Skint Imo 4") (6-7.9) 


In the first case, equation (5) gives 

: l l 1 1 | 
A= A |y i= o| nd itotat | + e (6-7.10) 
while in the second case, 


f" Z: f "(0) 


A=A+ 8n4cr4 


l l 
Í + z4 + 34 + J + € (6-7.11) 


The infinite sums appearing in these equations can be evaluated in 
terms of the “Bernoulli numbers” B,, on account of the relation 


2(2k)! | ] 
2k S Qr)” (1 T 2k T 32k T ~) (6-7.12) 
The first Bernoulli numbers are 
l 1 1 
=, B=—, B=-—, 
Bb=z B= 35° Ap 
Hence 
1 1 1 l 
— — + — =- -7.13 
S(itatet)=3 (6-7.13) 
] ( di 1 2. 1 4 l 
4 24 34 90 


We thus obtain from (10) as an estimate of the error of the trapezoidal 
rule: | 


A= f— or | f= rol = J — 4 | Fy KO) (6-7.14) 


in agreement with our previous result (6.5). 


§7 The Trapezoidal Rule with End Correction 389 


However, by a proper modification of f(x) we can eliminate the 
jump in the first derivative and put formula (11) in operation. Let 
us consider the function 

O) —f'O 
For this function the boundary condition (8) is satisfied (replacing 
f by g) and we obtain the estimate [cf. (11) and (13)], 


7 m (1 = g" (0) 
f g(x) dx = A, ee 


(6-7.15) 


(6-7.16) 
= 4+ E "O-O 


Now by the definition of g(x), 
1 1 1 
Í g(x) dx = Í fæ) dx — IF) -F'O (6-7.17) 
0 0 
while the operation A, is composed of the following two parts: 
ee | l n— l1 n 
="larots(t) tet) +E 
fd) —f'® 
= poe > k? (6-7.18) 


S o nn+in+) n č n? 
k=0 
and thus we obtain 


[ jods: ro “5 O| 


(6-7.19) 


L'O — f O] + LA” Gd) — f” (0)] (6-7.20) 


-7 12n? T : 
We return to our original notations, 


A = hiyo + ¥ + + Yn- + $401 — A (b)— f(a] (6-7.2) 


with the error estimate 


j= "O -SO (6-7.22) 


~ 


390 Quadrature Methods Chap. VI 


This result shows that the correction 
h2 
-5 O-O (6-7.23) 


(which requires the knowledge of the derivative at the two endpoints 
of the range) greatly increases the accuracy of the simple trapezoidal 
rule. The new error, if compared with the estimated error (5.3) of 
Simpson’s rule, is only } of that value and of opposite sign. 

Use of the trapezoidal rule with end correction is particularly 
advocated if the given ordinates are the result of observations and 
thus afflicted by accidental errors. The simple arithmetic mean after 
halving the two extreme ordinates will tend to minimize the influence 
of these errors. Moreover, we will be able to lay a least-square 
parabola of second order through a suitable number of points at the 
one and the other end of the range. The derivative of this parabola at 
the end point will then provide us with the values f'(b) and f'(a) 
which can be used for the end correction (23). By this device we can 
considerably increase the accuracy of the simple trapezoidal method. 


8. Numerical examples. Problem J. As a numerical demonstration 
of the operation of the various formulas we choose a simple example 
which permits us to follow the general analytical procedures with 
little technical complications. We choose the simple function 


y= e 


and assume h = 0.5. The range of integration shall extend from 
x = 0 to x = 4. Taking the ordinates from a table, we have 


a | y 

0 1 

0.5 1.64872 

1 2.71828 

1.5 4.48169 
7.38906 

2.5 | 12.18249 

3 20.08554 

3.5 } 33.11545 


4 54.59815 


§ 8 Numerical Examples 391 


The theoretical value of the area under the curve is in this problem 


b 4 
A= | s@ae= | e dx = 
a 0 
which gives A = 53.59815 


The trapezoidal formula gives 
A = 0.5(4 - 1 + 1.64872 + 2-71828 + =: + 4+ 54.59815) = 54.71015 
The error of this result is 

53.59815 — 54.71015 = —1.11200 


The theoretical error estimate (6.5) gives 


4 
=e! — ] 
0 


e” 


he, l 
— 7z V'È) — y'(@] = —z 4.59815 — 1) = —1.11663 


We now apply Simpson’s formula to the same problem. 

A = 340.5[} - 1 + 2.71828 + 7.38906 + 20.08554 + 4: 54.59815] 
+ 2(1.64872 + 4.48169 + 12.18249 + 33.11545) = 53.61622 

The new error is 53.59815 — 53.61622 = —0.01807 
The estimated error (5.3) becomes 
E. 
2880 
while (5.4) yields 7 = —0.01816. 


4 
= ZW”) — y”(a)] = — ~=—(54.598 — 1) = — 0.01861 


The trapezoidal rule with end correction becomes 
54.71015 — 1.11663 = 53.59352 


This value is slightly Jess than the correct value, while Simpson’s rule 
gave a value slightly more than the correct value. The new error is 


53.59815 — 53.59352 = 0.00463 
The estimated error is —} of the error of Simpson’s formula, that is, 
0.25 - 0.01861 = 0.00465 


In all these cases the predictions and the numerical results agree 
very Closely. We will now consider a problem which operates under 
less favorable circumstances. 


392 Quadrature Methods Chap. VI 


Problem IT. The function of Problem I was a “‘smooth” function, 
i.e., a function whose successive differences decreased satisfactorily. 
We will now choose a function whose Stirling expansion converges 
much less satisfactorily because of the nearness of a singular point. 
The given function shall be 


I 
E 
between the limits x = 0.1 and x = 1.7; the equidistant ordinates 
shall follow each other at the distance h = 0.2. 


æ | y 
0.1 | 3.16228 
0.3 | 1.82574 
0.5 | 1.41421 
0.7 | 1.19523 
0.9 | 1.05409 
1.1 | 0.95346 
1.3 | 0.87706 
1.5 | 0.81650 
1.7 | 0.76696 


Theoretical value: 
1.7 1.7 sete, 2 
Í x12 dx = 2 | gl/2 | = 2(4/1.7 — V0.1) = 2 - 0.98756 
0.1 0.1 


= 1.97512 
Application of the trapezoidal rule: 
A = 0.2(4 + 3.16228 + 1.82574 + = + 4 0-76696) 
= 0.2 - 10.10092 = 2.02018 error: —0.045] 
Application of Simpson’s formula: 
A = 40.2[4 - 3.16228 + 1.41421 + 1.05409 + 0.87706 + $ - 0.76696 
+ 2(1.82574 + 1.19523 + 0.95346 + 0.81605)] 
= 1.98558 ferror: —0.010] 
Trapezoidal rule with end correction: 
A 0.04/ 0.76696 3.16228 


) = 1.96823 
[error: 0.0069] 


§9 Approximation by Polynomials of Higher Order 393 


We observe in this example that the refinements of the simple 
trapezoidal method have now much less effect on the result than in the 
previous example. The order of magnitude of the error remained the 
same in all three methods. This is because the effect of the higher 
powers of h is counteracted by the strong increase of the higher 
derivatives. The nearness of the singular point x = 0 greatly reduces 
the effectiveness of the difference calculus by putting a much larger 
weight on the higher terms of the Stirling series, as we can see if we 
evaluate the estimated errors of the three formulas. 

Estimated error of the trapezoidal formula: 


| 0.767 3.162 


yes ee 
12 


ag ty) = 0051 


Estimated error of Simpson’s formula (cf. 5.3): 


0.0016 15 0.767 15 3.162 
A-A=- "(2 £ 


hcl pes n) Ee 
180 8 (1.73 | 8 aa) = 


Estimated error of the corrected trapezoidal formula: 


The unreliable estimation of the error in the case of Simpson’s 
formula is caused by the unsmoothness of the function, which has 
the consequence that the third derivative and the third central 
difference do not agree even approximately in the neighborhood of 
the lower limit. The formula (5.4) gives more reliable results. Its 
application yields 7 = —0.0139, which overestimates the true error 
7 = —0.010, but to no undue degree. 


9. Approximation by polynomials of higher order. In Simpson’s 
formula we terminated the Stirling formula with the quadratic term. 
We can obviously go further and terminate the series with a term of 
higher order. Correspondingly the number of panels involved will 
increase. Since we do not want to lose the great advantage of 
symmetric limits, the next step after the limits +1 will be the limits 
+2, involving four neighboring panels. This means five consecutive 
data: 


Yoo Y1> Yoo Y3» Ya (6-9.1) 


394 Quadrature Methods Chap. VI 


from which central differences up to the fourth order can be formed. 
Integrating between the limits +2 we obtain 


+2 
i= | f2+ dt 
~2 


FAH’ sfQ(e P t 
ot 3772 fs) : 
= 47) + 5 84) + E oY) (6-9.2) 


= Z [90f(2) + 60 ôf (2) + 7 d4f(2)) 


2 
= 45 [TY + 32y, + 12y + 32y, + Ty4] 


If the width of the panels is h, the formula has to be multiplied by A, 
and thus the final five-point formula becomes 


_ 2h 
= 5 (TY + 32y, + 12y + 32y + Tya) (6-9.3) 


The error of this approximation can again be estimated by the first 
neglected term of the Stirling expansion. 


j= ra " t?(t? — 1)(t? — 4) dt 
- i (6-9.4) 
oE- sE = — 3 sy) 
=> a 945 


If the 6th difference coefficient is replaced by the 6th derivative, the 
estimated error of the four-panel formula becomes 


_ 8h? 
I= —5gg FORM (6-9.5) 


In our previous numerical example the entire region was divided 
into 8 panels of equal width. Applying Simpson’s formula, we 
grouped these panels in the form of 4 double panels. We will now 
group them as a double group of 4 panels. Hence the weight factors 
of the successive ordinates become, considering the fact that y, is the 


§9 Approximation by Polynomials of Higher Order 395 


extreme right ordinate of the first group but simultaneously the 
extreme left ordinate of the second group, 


h 
a (7, 32, 12, 32, 14, 32, 12, 32, 7) (6-9.6) 


Applying these weights to the tabulated ordinates of Example I we 
obtain 


A= ana 2411.98706 = 53.59971 


The error of this result is 
A — A = 53.59815 — 53.59971 = —0.00156 


If we estimate the error on the basis of the formula (5), we have to 
remember that two sets of panels were employed. Accordingly the 
second factor has to be taken at the points 2h = 1 and 6h = 3 and 
their sum formed. 


fC) + f (3) = 2.72 + 20.09 = 22.81 
This gives 


_ 8 + (0.5)? 
Gel tna ia 


The accuracy of this estimation is very satisfactory. 

We notice that the fourth-order approximation decreased the error 
by the factor 12 if compared with the second-order approximation of 
Simpson’s rule. The gain is caused by the higher power of h, which is 
here not counteracted by an unduly large increase of the higher 
derivatives. Quite different is the situation in Problem II, where the 
higher derivatives go up very rapidly for the small values of x. If the 
weights (6) are applied to this problem, we obtain 


22.81 = —0.00151 


_ 04 

A = z5 [14(1.581139 + 1.054093 + 0.383482) 
+ 32(1.825742 + 1.195229 + 0.953463 + 0.816497) 
+ 12(1.414214 + 0.877058)] = 1.982818 


The error is now —0.0077, which is only slightly less than the 
previous error, —0.010. 

Here we do not succeed with a reliable estimation of the error, in 
contrast to the previous example. The reason is that the error 


396 Quadrature Methods Chap. VI 


estimation (4) was based on the convergence of the Stirling expansion, 
and this presumes the smooth behavior of the higher differences. In 
our present example the nearness of the singular point at x = 0 pre- 
cludes the forming of the sixth difference. Consequently we cannot 
expect valid results from the application of the formula (5). On the 
other hand, the method of § 12, which does not require higher than 
first derivatives, operates again satisfactorily. Its application to the 
present problem gives the error estimate 7 = —0.0127, which 
slightly, but not unduly, overestimates the true error n = —0.0077. 

Generally we can say that a higher-order approximation is helpful 
only if at the same time h is chosen sufficiently small. We can 
greatly gain in accuracy if we operate with a sufficiently small h and 
a polynomial approximation of not too low order. It would be a 
mistake, however, to believe that we will always obtain great accuracy 
by approximating the entire region by one polynomial of the order 
N if the total number of ordinates is N + 1. This is prevented by the 
generally divergent behavior of equidistant polynomial approxima- 
tion, as discussed before (cf. V, 15). In practice the excellent accuracy 
of Simpson’s formula is usually satisfactory since it combines a 
relatively high power of h, viz., the fourth, with a derivative of 
relatively low order, viz., the third [cf. (5.3)]. In the case of smooth 
functions the four-panel formula (3) deserves attention and we want to 
add a useful six-panel formula, known as “‘Weddle’s rule.” This 
formula operates with a sixth-order parabola, but makes a slight 
error in the sixth difference, for the purpose of simpler weighting. 


. 3h 
A= 10 (Yo + SY, + Ya + 6Y3 + Ya + SY5 + Ye) (6-9.7) 


Of considerable value is also the simple trapezoidal rule, augmented 
by end correction [cf. (7.21)]. The error of this formula is only —4 
of that of Simpson’s formula. It can be employed for checking 
purposes and is particularly useful if the given ordinates are not the 
result of calculation but of observation. 


10. The Gaussian quadrature method. The eminent mathematician 
Gauss injected an entirely new and exceptionally ingenious idea into 
the customary theory of quadratures. This idea in its wider implica- 
tions was quite fundamental for many fields of practical analysis. We 
assume that a certain integrable function y = f(x) is not given at 


§ 10 The Gaussian Quadrature Method 397 


every point of the continuous variable x but only at certain selected 
points 21, £2, ** , x, which shall lie inside of a given interval. Since 
we are going to deal with a finite range only, we can immediately 
normalize the range of interest. We will put the origin of the variable 
x in the middle of the range considered and choose a scale factor 
which makes the two end points of the range to the points x = +1. 
Hence we will now deal with the range 


—1<z< +l (6-10.1) 


and assume that the points 2, %,-°,2, in which the function 
y = f(x) is given belong to this range. The ordinates 


Yy = f(%) 
Ya = f (2) 

: (6-10.2) 
Yn = f&n) 


are generally not enough for a determination of the function f(x), 
no matter how large n may be. But we can try to interpolate f(x) for 
intermediate points. For this purpose we may use the powers of x. 
We can find a definite polynomial p,,_,(x) of the order n — 1 which 
has the property that it assumes the given values y, at the given 
points £y- 

In the usual calculus of finite differences we assume that the chosen 
points x = x, are equidistantly spaced. Gauss conceived the idea that 
we could possibly get much greater accuracy with the same number of 
ordinates if we did not fix their position in advance but utilized the 
distribution of the data points in some suitable fashion to our greatest 
advantage. By this procedure Gauss succeeded in obtaining not only 
a quadrature formula of extraordinary accuracy but also a procedure 
which is free of the dangers of equidistant polynomial interpolation, 
although these dangers were entirely unknown in his time. 

Let us assume that we leave the points of interpolation z = z, 
entirely free and want to determine the polynomial u = p,_,(x) which 
will fit the given ordinates 41, Y2, °°, Yn. The resulting formula is 
known as “Lagrange’s interpolation formula.”? It is based on 
constructing the fundamental polynomial 


F,,(«) = (£ — %)(@ — 23) + (® — 2) (6-10.3) 
1 Cf. {8}, p- 84, {11}, p. 86. 


398 Quadrature Methods Chap. VI 


and dividing it by synthetic division by the n root factors (x — z), 
+ , (£ — za). We thus obtain a set of polynomials, 


F (x 
a E n) 


F,(x,) £ — z; 


(= 1,2, , n) (6-10.4) 


which have the following properties: Q,(x) vanishes at all points 
x = 2, except x = x; where Q,(x) becomes 1. If we introduce 
“‘Kronecker’s delta” 6,, which is defined as 1 ifi=kand0ifik, 
we can write 

Q,(%,) = biz (6-10.5) 


But then we see that a polynomial p,_,(x) constructed by the sum 


Pn—1(&) = Vn) + Y2Q) + °° + Yna)  (6-10.6) 


satisfies the condition that it assumes at any point x = x, the 
prescribed ordinates y = y,. The uniqueness of p,_,(x) follows from 
the fact that the difference between p,_,(x) and. a hypothetical 
second polynomial p,_,(z) would assume the values 0 at all n 
points x = x, But the difference p,,_,() — p,,_,(«) is again a poly- 
nomial of the order n — 1, and such a polynomial cannot have more 
than n — 1 roots except by vanishing identically, which means 
p n—-(2) = P n-1(2). 

Now, if we consider p,,_;(x) a sufficiently close approximation of 
the given function y = f(x), we can obtain the area of the 
unknown function f(x) parexically by evaluating 


+1 2 +1 
A= N Pn-1(&) dz = p Yk 9 Q,(x) dx (6-10.7) 


For any given distribution of the points x = z, the Q,(x) are uniquely 
determined, and thus the definite integrals 


+1 
f Q(x) dx = w, (6-10.8) 
-1 
will have some definite numerical values which can be tabulated. 


These values are entirely independent of the nature of the function 
y = f(x) in whose area we are interested. 


§ 10 The Gaussian Quadrature Method 399 


Now the ingenious Gaussian quadrature method can be introduced 
as follows. Let us add an additional point 


T = Tny 


to the previous points, without changing in any way the previous 
points x, This will now introduce an additional root factor x — z,,, 
and generate an additional Q,,,,(z). We can see from the definition 
(4) of Q,(x) that this Q,,,(x) will be proportional to the previous 
F(x), since the new root factor (x — z,,,) drops out. Hence the 
weight factor w,,,, by which the new ordinate y,,,, has to be multiplied 
will be proportional to the definite integral 


+1 
Í F,(2) dx (6-10.9) 
=i 


Similarly, if m new points 
Daas. Cees. a Daa (6-10.10) 


are introduced together with their ordinates, the corresponding 
weights Wp+1s Wn+2s °°» Wnim are determined by a definite integral of 
the type 


+1 i 
wai = | FOG (6-10.11) 


where these Gi, _ ,(x) are some polynomials of the order m — 1. Now 
all these weights will become automatically zero if we let F,,(x) satisfy 
the following integral conditions: 


+1 +1 
Í F (£) dx = 0, =, f F,(x)e™- de=0 (6-10.12) 
—1 —1 


in view of the fact that an arbitrary polynomial G,,_,(z) is a linear 
superposition of the powers 1, x, 2, = , x”, 

In actual fact we can go up to m = n by requiring the integral 
conditions 


+1 
Í F,(z)a* dr =0 («a =0,1,2,=:,n— 1) (6-10.13) 
—1 
The result is that we can add freely any n points to our originally given 


n points, and yet none of the new ordinates will change anything on 
the result obtained before. Hence in effect we operated with 2n 


400 Quadrature Methods Chap. VI 


ordinates and yet in fact we used only n ordinates, since all the 
additional ordinates contributed nothing to the area to be evaluated. 
By this procedure we save n terms in the sum 


2R 
A= p YW, 
k=1 


but even more important is the fact that we need not even know the 
additional ordinates 9,44, Yn+2s °" > Yon: The sum 


Ae > YW (6-10.14) 


gives the area with the help of n ordinates and yet with an accuracy 
as if 2n ordinates had been used. 

Integral conditions of the type (13) are called “orthogonality 
conditions.” We say that the polynomial F,,(x) is “orthogonal” to 
the powers 1, x, x”, = , x"-l. We have encountered such conditions 
earlier when dealing with the “‘orthogonal functions systems” 
(cf. V, 16). We have studied the “Jacobi polynomials” (cf. V, 20), 
which have the property that they are orthogonal to all powers of 
lower order, exactly in the sense of the conditions (13). However, 
generally the orthogonality involves a weight factor p(x) in the inte- 
grand. Only in the special case of the “Legendre polynomials” 
(cf. 5-20.11) does it happen that the weight factor becomes 1 and thus 
weighted orthogonality changes into simple orthogonality. The 
choice of F,,(x) is thus decided; the Gaussian program requires that 
F(x) shall be identified with the nth Legendre polynomial P,,(2). 
The zeros of these polynomials give us the points at which the 
function f(x) has to be prescribed. They have been tabulated with 
great accuracy, together with the numerical values of the coefficients 
w; which can be calculated by evaluating the definite integrals (8) 
(cf. also § 13).1 


11. Numerical example. We want to apply the Gaussian method 
to the same numerical examples which we considered earlier in § 8. 


1 Cf. [2]. The labor of calculating the coefficients w; is reduced to one-half by 
the symmetry properties of the Legendre polynomials. The roots appear in 
pairs +&,; the weights belonging to two such points are equal. 


§ 11 Numerical Example 401 


The first example represents a very smooth, the second a very 
unsmooth function. The rapidity of convergence with increasing n is 
thus very different in the two cases. In our previous equidistant 
procedures, 9 equidistant ordinates were used. We will now replace 
them by only 5 nonequidistant ordinates. Since the tabulation of the 
Gaussian zeros assumes the interval [—1,1], we have to adjust an 
arbitrary interval to this normalization. We do that by the 
transformation 

7 b+a b—a 


+ 


> F (6-11.1) 


b b—a +1 
il f(@) dz = — [re dé (6-11.2) 


In our first problem a = 0, b = 4. Hence the ordinates have to be 
read at the points 


t= DAE = UTE) (6-11.3) 
For the choice n = 5 we obtain the five zeros, 
2 (1 + 0.906179846) 


2 (1 + 0.538469310) 
2 


The points of interpolation, together with the associated weights w,, 
thus become (using 8 decimal place accuracy) 


x, = 0.18764031 w, = 0.47385377 
£, = 0.92306138 wa = 0.95725734 
xry = 2 w = 1.13777778 (6-11.4) 
x, = 3.07693862 w, = 0.95725734 
x, = 3.81235969 w; = 0.47385377 


These weights were obtained by multiplying the Gaussian weights by 
(6 — a)/2 = 2. The ordinates of the function y = e” at these points 
can be taken from the “Tables of the Exponential Function” of the 


402 Quadrature Methods Chap. VI 


Mathematical Tables Project, New York, after making the proper 
interpolations. These ordinates become 

Yı = 1.20639950 

Yo = 2.51698405 

Yz = 7.38905610 

Yı = 21.69189349 

Ys = 45.25710562 


Multiplying these ordinates by the weight factors of the table (4) and 
summing, we obtain 


A = 53.59813663 
against the true value, 


A = et — 1 = 53.59815003 
The error of the approximation is 
n = 0.0000134 


The previous approximation (9.6) by 9 ordinates gave the much 
larger error 
n = —0.0016 


We see that the Gaussian method gives an admirable accuracy. In 
spite of operating with 5 instead of 9 ordinates, the error decreased 
by a factor of more than 100. We thus get the impression that the 
operation with uneven intervals performs even more than it promises. 
It seems to have a benefical effect on the error, even beyond the 
saving of ordinates. This is indeed the case. Equidistant interpola- 
tion is generally not a well-convergent process, and for functions 
which have singularities inside the unit circle, (although they may be 
entirely smooth between —1 and +1), the convergence is generally 
not even guaranteed (cf. V, 15). The Gaussian quadrature process 
uses the zeros of the Legendre polynomials, i.e., the zeros of an 
orthogonal set of functions, as points of interpolation. The con- 
vergence of this process is guaranteed by the general nature of 
orthogonal expansions, (cf. V, 21). 

The Gaussian quadrature method is thus superior to the ordinary 
equidistant methods for two reasons. One is that n ordinates are 


§ 11 Numerical Example 403 


comparable in effectiveness to 2n equidistant ordinates. The other is 
that interpolation by Legendre polynomials is much more convergent 
than interpolation by Lagrangian polynomials. 

In our second example of § 8 even the Gaussian quadrature has 
slow convergence. But once more we can demonstrate the power of 
the Gaussian method in the saving of ordinates. This time we want 
to use but four ordinates, instead of the original nine. The limits now 
are 

a=0.l, b= 1.7 


Hence x, = 0.9 + 0.86, 
The four points of interpolation become 


0.9 + 0.8 - 0.33998104 
0.9 + 0.8 - 0.86113631 


and we can set up the table 


z, = 0.21109095 
z, = 0.62801516 
x, = 1.17198483 


w, = 0.34785484 
wa = 0.65214515 
wa = 0.65214515 


xı = 1.58890905 Wy, = 0.34785484 


This time we have copied the weight factors w, unchanged, since it is 
numerically simpler to obtain the result and then multiply by 
(b — a)/2 = 0.8 than to multiply every weight by that factor. 
The ordinates y, of the function y=a~/?; at the points of 

interpolation are now 

Yı = 2.17653268 

Yo = 1.26187092 

Yz = 0.92371714 

Yı = 0.79332380 
Multiplying by the corresponding weight factors w, and summing 
gives 

2.45839962 


Hence A = 0.8 + 2.45839962 = 1.96671970 


404 Quadrature Methods Chap. VI 


The correct value of the area is here 
A = 1.97512 
which gives the error n = 0.0084 


This is only slightly more than the error obtained in § 9 by using 9 
ordinates: 


n = —0.0077 


In the previous case we divided the range of integration into two 
panels and used in each panel an approximating polynomial of 
fourth order. In the present case the entire range seems to be 
approximated by a polynomial of only third order. In actual fact a 
polynomial of seventh order is used because each of our points 
counts actually as a double point. We lay a parabola of seventh order 
through eight specially chosen points. The points of interpolation 
are not merely the four zeros of P,(&). They are in fact four pairs of 
points, but each pair lies close together and in the limit collapses into 
one point. The four zeros of P,(£) actually stand for the eight zeros 
of P4(é). 


12. The error of the Gaussian quadrature. The more effective a 
certain method of parexic analysis is, the more difficult it usually is to 
obtain a satisfactory estimate of the accuracy obtained. In the case 
of the Gaussian quadrature, estimation of the error is not easy. The 
traditional formula which estimates the error of the Gaussian 
quadrature requires knowledge of the 2nth derivative of f (x) through- 
out the interval of integration.’ This formula is 


! 272 2n4+1 ¢n) 0 
n= H a (9 = some unknown point between +1) 
(6-12.1) 


This estimate has several drawbacks. From the theoretical stand- 
point we can object to the assumption that f (x) possesses 2n derivatives 
in the interval of integration since the Gaussian quadrature converges 
to the proper value even in the case of nonanalytical functions, such 


as for example V | x |, whose first derivative becomes already 


1 Cf. [2], p. 740. 


§ 12 The Error of the Gaussian Quadrature 405 


infinite at x = 0 and yet is perfectly amenable to Gaussian quad- 
rature. From the practical standpoint it is usually difficult (except in 
simple cases) to evaluate the 2nth derivative of a function, even if its 
analytical form is given. But frequently f (x) is given only in tabulated 
form, and the analytical expression of f(x) is unknown. 

The following procedure is free of these objections. If the 
Gaussian quadrature is applied to f'(x), we know that the result 
should be 


[ r@d=sa—sen 


Hence in this case we can check up explicitly on the error of the 
Gaussian quadrature. For the proper exploitation of this idea we 
have to consider, however, that the quadrature between the limits 
—1 and +1 involves only the even part of the function, viz., 
f(&) + f(—2), while-the quadrature of the derivative would involve 
an entirely different function, viz., the odd part of f(x), which is 
f(x) — f(—2). We avoid this difficulty by taking the derivative of 
af (x) which has the same symmetry character as f(x) itself: 


1 
[ro de =say +70 (6-12.2) 


Since (xf) = af’ + f, we see that the numerical procedure is simple. 
We multiply the previous weights w, by &, f’(&,) instead of /(€,). The 
error of the new quadrature is now given as follows: 


n =f) +f(-) A > wf E) (612.3) 


In the case of general limits a and b (cf. 11.1), 


, b—a ~ (b—a\?~ f 
t= Os- A E) De'e 6124 


Now a closer investigation of the error of the Gaussian quadrature 
reveals that for functions which are not too unsmooth between —1 
and +1 the point 0 of the formula (1) is near to the origin £ = 0. But 
then the last factor of the formula is practically equal to the 
coefficient of é?” in the Taylor expansion around the origin. Now 


406 Quadrature Methods Chap. VI 


the expansion of (€f’) is identical with the original expansion except 
that a,,, is multiplied by 2n + 1. Under these conditions we get the 
error estimate 


y (6-12.5) 


The great advantage of this estimate is that it requires the first 
derivative of f (x) only, instead of the 2nth derivative of the traditional 
formula. We would think that the 2nth derivative of f (x) is necessary, 
considering the fact that the quadrature is accurate for any poly- 
nomial whose order is less than 2n. In order to eliminate such a 
polynomial, we must differentiate 2n times. But the formula (5), 
although we have differentiated only once, does not lose out on this 
account either. If f(x) is any polynomial of an order less than 2n, the 
function (xf) is again a polynomial of the same type. Hence the 
second quadrature gives the error zero, and the estimate (5) becomes 
likewise zero. 

If f(x) does not have the smoothness required by the above 
argument, we can assume that the differentiation will increase rather 
than decrease the unsmoothness of the function. Hence we can 
assume that the estimation according to (5) will tend to overestimate 
the error and thus we will be on the safe side, even if our estimate is 
not too realistic. We cannot be sure, however, in the case that f'(x) 
changes its sign in the given interval.? 

We will now apply this method of estimating the error of the 
Gaussian quadrature to the two numerical examples of § 11. In the 
first example we obtain 


n’ = 2 - 55.59815003 — 53.59813663 — 57.59801484 = 0.0001487 
Dividing by 2n + 1 = 11 we get 
n = 0.0000135 
which agrees perfectly with the actual error. Here the function was 
very smooth. We now come to the second example where the 


function is much less smooth inasmuch as the higher derivatives 


l The exact safety limits of this procedure are not yet established. 
y p y 


§13 Coefficients of Quadrature Formula—Arbitrary Zeros 407 


increase strongly in a certain range of the critical interval. We now 
obtain 


0.8 - 3.929242650 — 1.96671970 — 1.02713980 = 0.149535 


Here the number of points was n = 4 and thus we have to divide by 9: 
n = 0.01661 


The actual error is only one-half of this number. But this is still a 
satisfactory estimate and it is fortunate that we have overestimated 
the error. 


13. The coefficients of a quadrature formula with arbitrary zeros. 
The coefficients of an arbitrary quadrature formula (with or without 
a weight factor p(x)) can be evaluated by a simple numerical scheme. 
We know that if we interpolate f (x) at the n points x = &,, Ea °” , &,, 
the interpolation will be exact for any polynomial whose order is 
lower than n. In particular the successive powers 1, x, x7, + ,x"—} will 
be interpolated without any error. Hence the quadrature associated 
with all these powers will also be exact. We assume that we have 
evaluated the n definite integrals: 


b 
u, = ll p(x)x* dx (6-13.1) 
Now the general form of a quadrature formula is 
A= Y wf) (6-13.2) 
a=1 


and since in our special case A coincides with the exact value of the 
area A, we obtain the following n equations: 
M4 We ot Wy = uy 
oo + Wao t+ Wan =u (6-13.3) 
méi! + WÈ! oe + Wn = Uni 


These n linear equations are sufficient for a unique determination of 
the w, We have dealt with this problem of “weighted moments” 
earlier (cf. IV, 23) and obtained a simple numerical algorithm for 


408 Quadrature Methods Chap. VI 


its solution (cf. 4-23.11, 12, 13}. This algorithm is applicable to our 
problem and yields a simpler method for the evaluation of the w, 
than the explicit construction of the definite integrals (10.8]. 


14. Gaussian quadrature with rounded-off zeros. From the practical 
angle the Gaussian quadrature suffers from one serious drawback. 
The function has to be evaluated at irrational points. This requires 
heavy interpolation, which is a cumbersome procedure. In the case of 
tabulated functions it could easily happen that it would take much 
more effort to obtain n interpolated ordinates than to read off 
directly 2n equidistant ordinates. For this reason the Gaussian 
method is usually employed only if the evaluation of every y, requires 
an independent calculation, because of absence of any tabulation. 

This drawback of the Gaussian method can be remedied by the 
following modification of the original procedure. We round off the 
Gaussian zeros to a small number of decimal places, perhaps two or 
three, and evaluate the weight factors associated with these shifted 
zeros. The process of interpolation is then greatly simplified or even 
obviated if we possess tables of f(x) which proceed in units of 0.01 or 
perhaps 0.001 of the argument. It is true that the full accuracy of the 
Gaussian procedure is not available in this manner, but the accuracy 
is still high. In fact, by an additional correction scheme, considered 
in the next section, the full accuracy of the Gaussian method may be 
maintained. 

The coefficients w, associated with the shifted zeros can be 
evaluated according to the general method of § 13. We first construct 
the fundamental polynomial F,,(é) with the help of the root factors. 
Let us choose for example 7 = 5. Here the five Gaussian zeros are 


é = +0.53846-, 0, +0.90617 = 
We round them off to two decimal places: 
é = +0.54, 0, +0.91 
and construct the fundamental polynomial out of the root factors: 


F(E) = &(€2 — 0.542)(E2 — 0.912) 
— £5 — 1.119743 + 0.24147396é 


§ 14 Gaussian Quadrature with Rounded-off Zeros 409 
Comparison with the fifth Legendre polynomial 
P(E) = $(63€5 — 704° + 1568) 
requires multiplication by 63 and the factor 4 in front: 
4(63£5 — 70.54113 + 15.21285948¢) 

We see that the two polynomials have nearly equal coefficients. The 
Y, quantities of the equations (4-23.8) have now the meaning of the 
definite integrals (13.1) with p(x) = 1: 

fa di = Ea (k = 0, 2, 4, +) 
(A = 1, 3, 5, -~) 
giving rise to the reciprocal polynomial 

244 gE + HES 

by which F,(é) has to be multiplied, according to the scheme (4-23.11): 


0.4829479200, 0, —2.2394000000, 0, 2 
—0.7464666667, 0,  0.6666666667, 


0.4 
0.1364812533, 0, —1.5727333333, 0, 2 
Hence 
G,(&) = 0.1364812533 — 1.5727333333é? + 2&4 
and 


G,(£)__ 0.1364812533 — 1.572733333342 + 2&4 
F:(E) 0.24147396 — 3.3591é? + 5&4 


Substitution of £ = +0.91, +0.54, 0, yields the five weights w, of the 
quadrature formula:! 


€ = +0.91 w = 0.231387878 
+0.54 0.48601 1767 
0 0.565200708 


We apply these weights to our standard example of y = e” treated 


410 Quadrature Methods Chap. VI 


before (cf. § 11) with the exact Gaussian zeros. The new ordinates 
become 


x = 0.18 y= 1.19721736 
0.92 2.50929039 
2 7.38905610 
3.08 21.75840240 
3.82 45.60420832 


The weights have to be multiplied by 2, because of the double range 
of x.t! The sum of weighted ordinates becomes 


A = 53.59910051 
against the true value 
A = 53.59815003 


The error of the approximation is thus 
n = —0.00095048 


Compared with the Gaussian error, the error has increased by the 
factor 71, which shows the great sensitivity of the Gaussian method 
to even small shifts of the zeros. Nevertheless, the accuracy is still 
considerable. 

The estimation of the error on the basis of the method discussed 
in § 12 operates again satisfactorily. We now obtain 


7’ = 111.19630006 — 53.59910051 — 57.6093084 = —0.012109 


Division by 11 gives 
7 = —0.00110 


which is only slightly more than the true error given above. 


15. The use of double roots. In § 11 the remark was made that the 
great efficiency of the Gaussian method is explainable on the basis 
that the fundamental polynomial is not P,(é) but actually P2(é). 
Every point of the interpolation may be counted as a double point, 
for it is in the nature of Gaussian quadrature that n points can 
be added freely to the points of interpolation without changing 


1 A systematic table for the operation with rounded-off zeros was prepared by 
the Mathematical Tables Project, New York City, and is reprinted as Table XIV 
of the Appendix, by permission of the Project. 


§ 15 The Use of Double Roots 411 


anything, since the new points enter the quadrature formula with the 
weight zero. Let us now assume that we choose an arbitrary set of 
n points within the range but choose every point as a double root of 
F,,(€). Since double roots are equivalent to two close points which in 
the limit collapse into one, interpolation by n double points has the 
significance that at every point of interpolation the functional values 
f(x.) and its derivative f"(x,) are given. The Gaussian points are now 
chosen in such particular fashion that the weights of the derivatives 
shall become zero. The quadrature formula is thus reduced to n 
instead of 2n terms. 

If now the Gaussian zeros are slightly out of focus, the weight 
factors of the derivatives will not vanish any more but they will 
remain small. Hence it will not be necessary to know the derivatives 
with great accuracy. If the function is tabulated in sufficiently close 
intervals, the mere difference coefficient between two neighboring 
tabular values can take the place of the derivative. In this fashion we 
can round off the Gaussian zeros to convenient numbers, thus 
avoiding the inconvenience of interpolation, and still maintain the 
full accuracy of the Gaussian procedure. 

As a numerical example we return once more to our previous 
example of the exponential function, using five Gaussian points 
(cf. § 11). We round off these points to = +0.54 and E = +0.90 
(sacrificing the slightly better 0.91 in favor of a more convenient 
value). The coefficients w; and w; can again be evaluated according 
to the numerical scheme of the previous section, but raising the order , 
of F,,(é) to 10 by squaring. Acccordingly the polynomial G,,_,(&) will 
be of the order 9, but in actual fact we get a polynomial of the order 
4 in £? since all the odd powers of é drop out. As a result we obtain the 
five weights w, of the ordinates f(é,;), augmented by five weights w; 
of the derivatives f’(&;):! 


é = —0.90, w = 0.23640530, w’ = —0.00155377 


—0.54, 0.47899553, 0.00058042 
0 0.56919830, 0 
0.54 0.47899553, —0.00058042 
0.90 0.23640530, 0.00155377 


1 A systematic table of the weights w; and w; for the operation with the ordinates 
and their derivatives at the rounded off Gaussian zeros is not available at the 
present time. 


412 Quadrature Methods Chap. VI 


In our example the change of the limits has the consequence that the 
points of interpolation become 


x = 0.2, 0.92, 1, 3.08, 3.8 
The successive ordinates and their derivatives become 


Yı = y, = 1.22140276 
Yz = Yo = 2.50929039 
Ya = Yz = 7.38905610 
Y, = y, = 21.75840240 
Ys = Ys = 44.70118449 


The weights w, of the ordinates have to be multiplied by 2, the 
weights w; of the derivatives by 2 = 4. The result of the weighting 
and summing is 


A = 53.37259512 + 0.22554004 = 53,59813516 


The error is now 
n = 0.00001488 


and we see that the full accuracy of the Gaussian procedure is 
preserved. 

Once more we can estimate the error on the basis of the method 
described in § 12. Once more we obtain the formula (12.4) but with 
the following modifications. The correction in A, caused by the 
weights w;, is multiplied by 2. Moreover, we have to add one more 
term of the following form: 


b—a\Fo , 
_( 2 *) » Waa f "(X_) 
asl 


In our numerical example we obtain 


n’ = 111.19630006 — 53.59813516 — 0.22554004 
—4 » 14,229894611 — 8 - 0.0566116792 = 0.000153 


Division by 2n + 1 = 11 gives 7 = 0.0000139. The agreement with 
the actual error is again satisfactory. 


§16 Applications of the Gaussian Quadrature Method 413 


16. Engineering applications of the Gaussian quadrature method. 
The Gaussian quadrature method is characterized by very high 
accuracy. Even a small number of ordinates gives usually a very 
accurate evaluation of a definite integral. In problems of engineering, 
excessive accuracy is seldom required. The Gaussian quadrature 
method has its place, however, as an excellent device for economizing 
in the number of ordinates. It happens rather frequently that the 
average value of a function of unknown structure has to be established 
on the basis of very few observations. In this case it is strongly 
advocated that the points where the ordinates are measured shall 
follow the Gaussian pattern. 

For example, in our standard numerical problem of evaluating the 
definite integral 


4 
A= | eu 
0 


the use of Simpson’s rule, employing nine equidistant ordinates, 
gave an error of 0.02 in 54 units, i.e., an accuracy of 0.04%. Such 
accuracy will seldom be required in an engineering problem. Let us 
now cut down the number of ordinates to three. The Gaussian 
procedure requires that these ordinates shall be placed at the follow- 
ing x values and taken into account with the following weight 
factors: 


x = 0.45 w = 1.11 
2 1.78 
3.55 1.11 


The calculation gives Å = 53.535 
compared with the true value 

A = 53.598 
The error 1s n = 0.063 


This error is three times as large as the error of nine ordinates but 
still very acceptable. Yet the number of ordinates was only three. 
Hence use of the Gaussian quadrature method is strongly indicated 
if for some reasons we have to economize on the number of ordinates 
employed for establishment of the average value of an unknown 


414 Quadrature Methods Chap. VI 


function. Had we used three equidistant ordinates in the above 
example, we would have obtained the value 


A = 56.77 


The error is now y = 3.2 


Hence the error is 50 times as big as when using the Gaussian method. 

The pressure tubes in an airduct will give much more favorable 
results if they are not uniformly distributed over the cross section of 
the airduct but in conformity with the Gaussian zeros. The same 
holds for temperature measurements along a wall or for temperature 
measurements spread over a certain time interval if the purpose of 
these measurements is to establish average values. 

The Gaussian zeros and the associated weight factors have been 
calculated with great accuracy and are available in tabular form. 
The Table XIII of the Appendix containing these data is taken from 
the calculations of the Mathematical Tables Project in New York. 
The same project evaluated the weight factors which belong to the 
rounded off values of the Gaussian zeros. Part of this table is 
included in the Appendix (cf. Table XIV), with the permission of the 
Project. 


17. Simpson’s formula with end correction. We have seen that the 
error caused by a small shift of the Gaussian zeros could be counter- 
acted by adding the knowledge of the derivatives to the knowledge 
of the functional values. The weights of the derivatives at symmetric- 
ally placed points entered with + signs. We can take advantage of 
this property of the weights w, for a modification of Simpson’s rule 
which greatly increases its accuracy at the cost of a small additional 
calculation. In § 4 we discussed Simpson’s method. It consisted in 
dividing the entire range into an even number of panels and approxi- 
mating every double panel by a parabola of second order. Let us 
concentrate on such a double panel and assume that both the 
mid-point and the two end points shall be taken as double points. 
Then in effect we approximate by a parabola of fifth order, and the 
error will be proportional to the sixth derivative, instead of the 
previous fourth derivative. For sufficiently smooth functions, the 
accuracy of the formula is thus greatly increased. On the other hand, 
we now have to know function and derivative at every panel point. In 


§ 17 Simpson’s Formula with End Correction 415 


actual fact, however, since every panel point (with the exception of the 
two end points) is the terminal point of one panel and at the same 
time the starting point of the next panel, the derivatives enter with 
the weights w’ — w’ = 0; only the two end points behave differently, 
and thus we have to know the derivatives only at these two points. 

The evaluation of the weights is here so simple that we need not 
take recourse to the general procedure of § 13. We can solve the 
linear equations for the weights directly, making use of the fact that 
for the powers 1,2, -,2> we must get exact results. The odd 
powers can be neglected, since for them the equations balance 
automatically, because of symmetry. Hence only y = 1, x?, xt have 
to be tried. We have 3 unknowns, viz., the two weights 


x=-—l1 0 1 
Wy Wo Wi 


and the third weight —w, 0 wi 


The weight wọ can be equated to zero in advance, because of 
symmetry. Now we have for the three trial functions 


J= y(—D)= 1 yO=1 y(I=!1 
Y= 8 y'(1)=0 
y=) y(—1)= 1 y(0) = 0 y(1)=1 
y'(—1) = —2 y'(1)=2 
y= xl y(-)= 1 y¥O=0 y) =! 
y (—1) = —4 y'(1)=4 
This gives the three conditions: 
2w, + 4w, =$ (6-17.1) 
from which 
wi = i5, w, = 15, w = 15 (6-17.2) 


and the resulting formula becomes 


A = is(7(f(—) +f) + 1FO+f'(-D)—-f"()) (6-17.3) 


416 Quadrature Methods Chap. VI 


If the distance between neighboring ordinates is not 1 but h, the 
formula has to be modified as follows. 


h , , 
A= TAAG + Ye) + 16y, + h(Yo — Y2) (6-17.4) 


where Yọ, ¥1, Ya indicate the three successive ordinates. 
Example. The function 
y = sin x 


between x = 0 and 7 is roughly of a parabolic shape. Hence we get 
a satisfactory approximation of this function by giving it in the 
three equidistant points, 


x=0 n|)? r 
y=0 1 0 


and approximating it by a parabola of second order. In this problem 


Moreover: y=1 y, = —l 
Application of Simpson’s rule gives 
2 


on T 
A =F (Yo + 41 + Y) => = 2.094 
while the true area is 
a= sin x dx = — | cos x| = 2 
0 0 


The error is thus only 4.7%. 
n = —0.094 


The new formula gives for the same area, 
A= = (16 + 7) = 2.0045 


The error 7 = —0.0045 


is 20 times smaller than before. 


§ 17 Simpson’s Formula with End Correction 417 


We can motivate this increase of accuracy by an estimate of the 
error. In Simpson’s case the estimated error becomes, according to 
the mean value theorem of integral calculus: 


OO "O (Pa. my SO 
ars mal x?) dx = 50 (6-17.5) 


and generally (panel width A, limits a, b), 


f™O) 
180 


h(b — a) (6-17.6) 


In the case of the formula (3), however, we obtain 


1 
f hd i 21 — x3)? de -L0 (6-17.7) 


and in the general case 


= f (8) = 
n = 5450 h°(b — a) (6-17.8) 


This gives, applied to our numerical example, the estimates 


n = —0.106 (Simpson) 
7 = —0.00499 (modified Simpson) 


The great accuracy of these error estimates is explainable by the fact 
that in our example, the point of maximum of f‘")(6) and the mid- 
point of the range coincide. 

Let us now divide a given range into an even number of panels, as 
we have done before in applying Simpson’s parabolic approximation. 
We apply our formula to each double panel and put these areas 
together. The correction term in y’ cancels out at all inside points, 
since it comes in with alternate sign from both sides. The only 
correction which remains is that at the two end points of the range. 
The resulting formula becomes 


- h 
A ==, [14yo + Ye + Ya + + BY en) 
+ 16(y, + Ys + Ys + °° Yen) 
+ hyo — Yan) (6-17.9) 


418 Quadrature Methods Chap. VI 


This formula shows that at a relatively little sacrifice, namely, adding 
the derivatives at the two end points of the range, we gain very 
considerably in accuracy. Our approximation is now of fifth order. 
That means our formula is exact for any f(x) which can be repre- 
sented by an arbitrary power expansion of fifth order. Simpson’s 
rule gives exact results for a power expansion of third order only. The 
two additional powers mean a large increase in accuracy in the case 
of smooth functions. 

Numerical example. We go back to the numerical example of § 8 
and apply the formula (9) to the nine ordinates of that problem. At 
present h = 4 and 

y = e 


If we substitute the numerical values in (9) we obtain 
A = 53.5980641 

compared with the true value 
A = 53.5981500 

The error is thus n = 0.000086 


Simpson’s formula gave the error —0.0181. The new error is 200 
times smaller. This shows the great effectiveness of the end cor- 
rection. The estimated increase of accuracy is given by the ratio 


f'®(0) 90h? 
fH) 4725 


This gives in our example the fraction 1/210; i.e., the error of the 
corrected formula is an estimated 210 times smaller than that of the 
uncorrected formula, in good agreement with the facts. 


18. Quadrature involving exponentials. In many problems of 
applied analysis a certain “integral transform” of the following form 
is encountered: 


b 
F(p) = f f(ae”* dx (6-18.1) 


We assume that f(x) is given in tabulated form, x proceeding in 
equidistant intervals Az = h. We assume that this interval is so 
small that linear interpolation is sufficient for functional values 


§ 19 Quadrature by Differentiation 419 


which lie between the tabulated points. In this case we could use 
the simple trapezoidal rule (3.5) for the numerical evaluation of the 
integral (1), were it not for the exponential factor e?”. If p is small 
enough, the trapezoidal rule would still hold. But we may need the 
value of the integral (1) for larger values of p. Hence it is of advantage 
to know how to evaluate an integral of the form (1) for sufficiently 
closely tabulated f (x), but without making any restrictions concerning 
p, which may assume any real or imaginary or complex values. 

We interpolate f(x) linearly from panel to panel, starting at the 
mid-point x, + $h of each panel and proceeding to the two points 
(x, + $h) + 4h. We now perform the integration in each panel and 
form the sum. The result is the trapezoidal formula but with 
certain corrections. Let the result of the trapezoidal summation 
procedure be S(p). Then 


F(p) = E 6i S(p) + e = l flaer — fbe] (6-18.2) 


where 
sinh ph 
ph 

This formula has many applications and is particularly useful if the 
Fourier coefficients of an empirically given function are to be 
determined (in which case p is purely imaginary). The uncorrected 
sum S(p) gives the coefficients of the finite trigonometric series which 
passes through the given points, (cf. IV, 11-15). Hence the formula 
(2) can be conceived as an expression of the relation between the true 
Fourier coefficients (demanded for example in the acoustical 
analysis of an empirically given function), and the coefficients 
obtained by trigonometric interpolation. 


(6-18.3) 


o(p) = 


19. Quadrature by differentiation. Many functions of applied 
analysis are defined by a certain differential equation. If a function 
of this kind has to be integrated, we might consider it desirable to 
base the integration on the knowledge of the function and its 
derivatives at the two end points of the range, since the successive 
derivatives are easily calculable from the defining differential 
equation if we know the boundary values at the two end points. Our 
problem is then to obtain an effective quadrature formula which uses 
no inside ordinates but only the two end ordinates and its derivatives. 


420 Quadrature Methods Chap. VI 


A formula of this kind makes use of the end information not for 
increased accuracy, but for complete evaluation of a definite integral. 
We may also say that our ordinates are now distributed in an 
extreme fashion inasmuch as they crowd infinitely near to the two 
end points of the given range. The ordinary Taylor expansion 
corresponds to the case when all the given ordinates are infinitely 
near to one point of the range. But we want to assume that both end 
points are equally represented. 

We start with an elementary formula of integral calculus, based on 
the method of integrating by parts. 


b b 
f u(x)v'™ (x) dx — ll (—1)"v(x)ul™ (x) dx 


= | uit) — yv- 4 (6-19.1) 
n—l b 

=| > Pepe 
K=0 a 


We will make the following use of this formula. The range of 
integration shall be normalized to [0, I]. Moreover, we choose 


(a. ee (6-19.2) 


Ynn! 
where the polynomial 


Prl®) = yat” F Yriat"™ e H Yo (6-19.3) 


is freely at our disposal. 
With this choice of the functions u(x) and v(x) the formula (1) may 
be written as follows. 


1 


+7, (6-19.4) 
0 


n—l 


> OORT 


k=0 


[ ro dx = 


Yan! 


where 7,, stands for the definite integral 


: x 
am" { oem f£) dx (6-19.5) 


Formula (4) can be conceived as a quadrature formula which obtains 


§ 19 Quadrature by Differentiation 421 


the area under the curve solely in terms of boundary values, given at 
the two end points x = 0 and x = 1 of the oe 


] 
Yan! | 


n— 


A= 


(n—k—- Vg] | (6-19.6) 


whereas 7,,, given in the form (5), represents the remainder of our 
quadrature formula. 

We first concentrate on this remainder. Our aim will be to dispose 
of p,(x) in such a way that the remainder (5) shall become partic- 
ularly small. One choice is of particular interest here because it 
translates the outstanding features of the Gaussian quadrature method 
to our present problem. In discussing the Gaussian method (cf. § 10) 
we have seen that while an arbitrary distribution of the points of 
interpolation led to a quadrature formula which gives exact results 
for an arbitrary polynomial of not higher than (n — 1)st order, the 
Gaussian points of interpolation gave a quadrature formula which 
yields exact results for any polynomial of (2n — 1)st order. 

In our case we have no choice concerning the points of inter- 
polation, since we have decided already that our quadrature will be 
based on the boundary values of the given function and its derivatives 
at both end points of the range. But the polynomial p,(z) is still 
freely at our disposal. The form (5) of the remainder shows that our 
quadrature formula will be exact for any polynomial of not higher 
than (n — 1)st order, but the remainder will not vanish generally for 
a polynomial of still higher order. For one particular choice of 
P(x), however, we can make the quadrature formula (4) exact for 
any polynomial up to the order 2n — 1. 

Let us consider the Legendre polynomials P,,(x) (cf. V, 20), but 
renormalized to the range [0, 1], instead of the traditional range 
[—1, 1]. We will denote these polynomials by PZ(x). They are 
directly expressible in terms of the Gaussian hypergeometric function 
F(a, B, y, x) [cf. 5-20.11]. 


P* (x) = F(—n,n+ 1,1; x) (6-19.7) 
ntl) en — Dn + $2) 
= eh eae 


p GED! uaj 
Sa — KUKI? 


422 Quadrature Methods Chap. VI 


The polynomial P* (x) has the following remarkable property. It can 
be written as the nth derivative of a function which vanishes together 
with all its derivatives up to the order n — 1 at the two end points 
x = 0 and x = 1 of the range. 


— ot — x)|" 


P2@) =—— = 


(6-19.8) 
But then we can make use of the integral transformation (1), 
identifying u(x) with f‘”(x) and v(x) with [x(1 — x)]". The boundary 
terms on the right side drop out entirely, on account of the properties 
of v(x). What remains can be written 


N, = = Ai fa ja (6-19.9) 


Yaln id 


sient f(x) [x — 2)]" dx 


E e ya(n)? 
By definition, yp denotes the coefficient of the highest power of 
P(x). According to (7) we obtain 
(2n)! 
n — (—])r = 
Moreover, the Legendre polynomials P7(x) possess the symmetry 
property 


(6-19.10) 


P*(x) = (—1)*P*(1 — 2) (6-19.11) 


Because of this property, the sum on the right side of (6) becomes 
reducible from 2n to n terms. 


A= = S eip, (6-19.12) 
where 
Ys = f 0) + (|D Md = (6-19.13) 
Now the expansion (7) gives 
PX) = (—1)* on (6-19.14) 
and we will denote 
(—1)7-rp* n(o) = cr = RH! (6-19.15) 


k — (n—k)'!k! 


§ 19 Quadrature by Differentiation 423 


With this notation the resulting quadrature formula becomes 


i= px: Pade (6-19.16) 
0 £=0 


For arbitrary limits [a, b] the formula (16) becomes modified as 
follows. We define 


Ys =f Pa) + (Dfb) (6-19.17) 


and obtain 


4, — 5a 1 b — ay, (6-19.18) 
0 k=0 


The coefficients CZ are tabulated in Table XV of the Appendix. 

As an example, let us assume that the function and its first and 
second derivatives are given at both end points of the range. Then 
n = 3, and we obtain 


A= 4f(ah + wf’ (ah? -+ r20f "(a)h 


+ $f (b)h — vo f h? + itf "h? 
with h = b — a. 

The error 7, of the quadrature formula (16) can be estimated as 
follows. Since the weight function [x(1 — x)]” does not change its 
sign throughout the interval [0, 1], we obtain by the theorem of 
weighted means, 

1 
Í feol — x)]”dx 
+o ______ = f (6) (6-19.19) 
Í [x(1 — x)]"dx 
0 


where @ is some point within the interval [0,1]. Now 


: Vr n! 
f [z(1 — x)]” dx = Fanti TEEN Th (6-19.20) 
and thus 
_ (—1)"°V 2 n! 


eS aa ai plan O (6-19.21) 


424 Quadrature Methods Chap. VI 


and more generally for the case of arbitrary limits, 


- (—1)"! b — a\2"41 
ng = Va so s) (6-19.22) 


The uncertainty in the position of the point 0 can be alleviated if we 
can assume that f‘?”(x) does not change too violently within the 
interval [a, b]. The function [z(1 — x)]” cuts out a narrow “window” 
around the point x = 3 because, if n is large enough, the function 
falls off rapidly on both sides of that point. The weighted mean (19) 
is thus heavily loaded in favor of the immediate neighborhood of 
x = 4. If f(x) is sufficiently smooth, we can identify 0 with the 
point x = 4 and in the general case with the point 


 b—a 


6 
2 


Furthermore, Stirling’s formula for the factorial function shows that 
for estimation purposes we may put 


n! ip 
TEST n (6-19.23) 


Under these conditions we obtain the following realistic estimation 
of the error of the quadrature formula (18). 


-— fm o—4 n 
z = (—1)" G ) = +1 (6-19.24) 


The general character of this formula is quite similar to the one 
valid in the Gaussian quadrature;! [cf. (12.1)]. The numerical factor 
is quite different, however, in the one and in the other case. The ratio 
of the two factors is 

(—4)" 
Pn = 
“Warn 

1 For this reason the error estimation method of § 12 is applicable again. As 
test function we use the function [(2z — 1) f(x)]’ (assuming the range [0, 1]), for 
which the result of the quadrature is f (1) + f (0). Hence 7’ is again explicitly at 


our disposal and once more the estimated error of the original quadrature’ 
becomes 7 = 7//(2n + 1). 


(6-19.25) 


§ 20 The Exponential Function 425 


in favor of the Gaussian quadrature; i.e., the estimated error of the 
quadrature formula based on 2n boundary values is u,„ times larger 
than the estimated error of the Gaussian quadrature, using n 
interior points. 

This is understandable, however, if we realize how badly handi- 
capped we are by taking our information completely from the two 
boundaries of the interval, instead of using judiciously chosen 
ordinates of the inside. The importance of the method lies in the fact, 
however, that it is frequently so much easier to get the function and 
its derivatives at the two end points of the interval than to evaluate 
the functional values at some inside points. This is particularly true 
if an untabulated and unknown function is defined by a differential 
equation. Then application of the quadrature formula (16) can lead 
to an effective method of solving the given differential equation 
(cf. § 21). 


20. The exponential function. A particular example which is 
well adapted to demonstrate the power of the method is the 
exponential function 


JE) = e” (6-20.1) 


integrated between 0 and 1. Here we know that the result of the 
integration 1s 


1 
1 
Í eda = —(e* — 1) (6-20.2) 
0 a 


Collecting all the terms with e* and replacing « by x we obtain a 
rational approximation of the exponential function e* which appears 
in the following form." 


eh E e E (6-20.3) 


1 The author is indebted to his friend Charles Davis, numerical analyst, 
North American Aviation for pointing out to him that this approximation was 
found earlier by P. M. Hummel and C. L. Seebeck, “A Generalization of Taylor’s 
Theorem,” Association Monthly, 56, 243—247 (1949). 


426 Quadrature Methods Chap. VI 


The first four approximations (n = 1, 2, 3, 4) are given as follows. 


2+ae 12+6¢+2% 120+ 60x + 12x? + 23 
2—x° 12—64+ 27’ 120 — 60x + 122? — x3’ 


(6-20.4) 


1680 +- 840x + 180r? + 2023 +- x4 
1680 — 840x + 180r? — 20x? + x4 


The osculation with e” is of the order 2n; that means that by 
expanding these ratios into powers of x, the agreement with the 
coefficients of the Taylor expai:sion extends up to the term of the 
order 2n, although the number of coefficients at our disposal is 
only n. 

If we put x = 1, we obtain successive rational convergents of the 
transcendental number e which are of astonishing precision. 


e= = =3 (6-20.5) 
= = 2.714 (n = 4+ 10-3) 
193 = 2.71831 (n = —3+ 10-5) 
71 
-a = 2.7182817 (n = 1- 107?) 
alk = 2.7182818287 (n = —3 - 1019) 
aini = 2.7182818284586 (yn = 4: 10) 


It is also of interest to apply our quadrature formula to the same 
numerical example that we employed in demonstrating the power of 
the Gaussian quadrature; (cf. § 11). Here we had n= 5, a= 0, 
b = 4, and the given function was y = e”. The application of the 


§ 21 Eigenvalue Problems 427 


formula (19.18), taking the coefficients C? from Table XV of the 
Appendix, gives 


A= ae [15120 - 41 + e*) + 3360 > 16(1 — e$) 


30240 
+ 420 - 64(1 + e*) + 30 - 256(1 — e*) + 1- 1024(1 + e*)] 
= 53.60174 
Comparison with the true value A = 53.59815 shows that 
n = —0.0036 


while the Gaussian error was only 7 = 0.000013. The change of sign 
is explained by the factor (—1)” of formula (19.24) which gives the 
minus sign in the case of n = 5. Moreover, the factor u, [cf. (19.25)], 
which is the estimated magnification of the error compared with the 


Gaussian error, is now 
5 


4 
| Hs | = 7 = 258 


in close agreement with the facts. 


21. Eigenvalue problems. In V, 17 and 18 we encountered a class 
of problems which play a fundamental and increasingly important 
role in all types of vibration problems associated with elasticity, 
flutter analysis, wave guides, atomic physics. They are called 
“eigenvalue problems.” The general situation encountered in such 
problems can be described as follows. Given a certain linear 
differential operator which contains an unknown constant parameter 
usually called the “eigenvalue” A and given certain homogeneous 
boundary conditions which are such that without the proper choice 
of À no solution outside the trivial solution y = 0 is possible, the 
problem is to find the smallest 4,, or a few of the smallest 2,, which 
make a solution possible. 

In such eigenvalue problems the quadrature method of § 19 may 
be of considerable help, since it is based on the knowledge of the 
function and its derivatives at the two end points of the interval, and 
these quantities are available on the basis of the given differential 
equation and the given boundary conditions. 

In order to show the operation of the method, we choose a simple 
example, but the method is applicable under much more complicated 


428 Quadrature Methods Chap. VI 


conditions. It is our aim, however, to study the essential features of 
the method, unhampered by technical difficulties. For this reason we 
choose a simple differential operator of second order, with constant 
coefficients: 


y” +y + dy =0 (6-21.1) 
with the boundary conditions 
WO=0, yD=0 = 621.2) 


The given interval is thus normalized to [0, 1]. 
Since a linear homogeneous differential equation leaves an 
amplitude factor undetermined, we can assign arbitrarily the value 


y(0)=1 
to the derivative at v = 0. Then the differential equation (1) 
determines uniquely the values of all the derivatives at x = 0. We 


obtain these by successive differentiation, or by substituting into the 
differential equation the power expansion 


Y = ao + Qe + ax? + + 
We collect terms and put the resulting coefficient of x* equal to zero. 
In our simple problem we get 
1—A 
ay = 0, a, = 1, a, = —ł, st or a 
(6-21.3) 
y¥)=0 YOrL y=- YO=I-4 


The higher we go with the evaluation of the successive coefficients, 
the greater accuracy can we expect. For our present purposes we 
will stop with az. 

At the other end point we will put similarly 


== 1p é, y = ba + bié + b£? + bé? 


Using the same method of substituting in the differential equation 
and collecting terms, we obtain no value for bọ but the later 
coefficients all appear as linear functions of bp. 


Ab, 
bo, b, = 0, b = TE bs =-7 
y(i) = b» y) =0, y"(1) = —Aby, y”(1) = Ap 


(6-21.4) 


§ 21 Eigenvalue Problems 429 
We will now apply our quadrature formula, considering y"(x) as 
our function f(x); hence f(x) = y”(x) and f'(x) = y"(x) are given 
at the two end points 
fO =-1, f(D) = —Aby 
f'O=1-4  f') = db 
The quadrature formula for n = 2 gives 


Be f 6(—1—Ab,) + 1(1 — A — Ab,) 
[ ¥@ a =0—¥@ = 
0 12 
| bo = 
E 12 


This gives the relation 
7—àÀ 
= —— 6-21. 


We now use the quadrature formula once more, considering y’(2) 


as f (x) 


fO =1, fd) =0 
f'® =-1, fD) = —Aby 
f"O)=1—A f") = Ab, 


At present we have 3 data on both ends and thus use the formula 
forn = 3 


1 
Í y'(2) de = y(1) — y(0) 
__ 60(1 + 0) + 12(—1 + Aby) + (1 — A+ Ady) 


120 
b 49 — A+ 134b 
= 120 
This gives the new relation 
49 — À 
° 120 — 134 ieee 


Equating the right sides of (5) and (6) we obtain a quadratic equation 
for determination of the eigenvalue 


204? — 554A + 840 = 0 


430 Quadrature Methods Chap. VI 


The two roots of this equation are 
A, = 1.6095, A, = 26.0905 (6-21.7) 


Only the smaller root has significance, since the larger root is 
enormously sensitive to higher-order corrections. The smaller root, 
however, does not change much if we continue our differentiation 
process. We have stopped with y”(x). If we go one step further 
and include y”(0) and y”(1) in our calculations, the first application 
of the quadrature formula involves n = 3 and the second involves 
n = 4. We then get the two relations 


71 — 10A 679 — 18A 


SS eee d SS ae ee 
b= y 7 0 = 1680 — 201A + 22 


which now yield the following cubie equation for A 
2843 — 4074/2 + 806382 — 119280 = 0 


This is still an essentially quadratic equation, since the cubic term 
is only a small correction. We can first neglect the cubic term and 
obtain a preliminary 4), then add the cubic term-evaluated with 
this A) as a correction to the absolute term. This yields for the 


smallest root 
A, = 1.608467 (6-21.8) 


The small change of 4, compared with the corresponding value found 
in (7) shows that the first rather crude approximation was very close, 
the error being 1 unit in the third decimal, which is an accuracy of 
0.07%. In the present simple problem we can check our results, 
because the exact value of À is theoretically available. We find 


A=it+ 0? (6-21.9) 
where @ is a solution of the transcendental equation 

tan 0 = 20 (6-21.10) 
The smallest root of this equation is 


0, = 1.1655618 
which gives 
1 = 1.608534 (6-21.11) 


§ 21 Eigenvalue Problems 431 


Hence the error of the n = 2,3 approximation is 7 = —0.0010, 
while the error of the n = 3, 4 approximation is only 7 = 0.000067. 
We see that the.convergence of the method if applied to a smooth 
function is rapid. Of interest is also the approach from above and 
below in the two successive cases. 

The so-called “Rayleigh-Ritz method” of obtaining eigenvalues 
is based on minimization of a certain integral. Hence it is applicable 
only to self-adjoint differential problems. The present method does 
not require that either the differential operator or the boundary 
conditions shall be self-adjoined. It operates under more general 
conditions, and in principle even linearity of the differential equation 
need not be demanded for application of the method. 

If we pursue these ideas one step further, we come to the con- 
clusion that this quadrature procedure can be applied not only to 
evaluation of eigenvalues, but also to actual solution of differential 
equations. Let us make the substitution 


x = at, (6-21.12) 


considering x, the new independent variable and « a given constant 
parameter. The point z, = 1 in the new variable corresponds to the 
point x = « in the old variable. Hence y(1) in the new variable gives 
actually the original y(«), and replacing « by x, we have obtained the 
unknown function at the variable point x. 

We show the operation of this method by using the previous 
example. The differential equation (1) now becomes 


y” + ay’ + a®Ay = 0 
The table (3) of the initial values has to be replaced as follows: 
1—A 


3 


Qo = 0, Qa, = X, a, = — ła? az = x 


y0) =0, y0) =x, y" 0) = —a, y”) = (1 — Ajo? 


while the table of end values (4) becomes modified as follows (the 
boundary condition y’(1) = 0 is no longer valid). 


y(1) = bo, y'(1) = b, y”(1) — —ab, == x Abo 
y"(1) = (1 — å)a?b, + «Fb, 


432 Quadrature Methods Chap. VI 


In the present problem, the unknowns are b and b,, while A is 
already known. The procedure, however, is once more the same. 
The first relation which before led to (5) becomes now 


1 
bı — & = D [6(—a?— ab, — «?Ab,) 


+ (1 — å)a? — (1 — A)a?b, — a Ady] 
This gives the linear relation 
(6a7A + a3A)by + [12 + 6a + (1 — Aja? Jd, 
= 12% — 6a? + (1 — å)a? 


For the second relation we will simplify matters at the sacrifice of 
accuracy. We will once more use the quadrature formula for n = 2 
only, applied to the first three boundary values, and not make use of 
the values y"(0) and y"(1). We now get 


1 
bo — 12 [6(a + b,) + (— a + ab, + x?Abo)] 
which gives the new linear relation 
(12 — a?A)by — (6 + a)b, = 6a — a? 
Solving the two linear equations for by and 6,, we obtain 
m a(144 — 122a?) 

° 144 + 720 + 12(1 + Aja? + 6a? + A204 
144 — 72a + 1201 — 5A)a? + 6Aa3 + Aza! 
= IMA F Te + 1211 + Aa? F Gha? + Aad 


We now go back to the original variable x; in this variable the 
significance of bọ and b, becomes by = y(a) and b, = ay’(a). We 
thus see that we have obtained two independent approximations for 
y(x) and y’(x) in the form 


bı 


x(144 — 124x?) 
144 + 72x + 12(1 + Aja? + 6x? + A*x4 
144 — 72” + 12(1 — 5å)x? + 6x3? + A224 
144 + 72x + 12(1 + Ajax? + 6Ax? + A224 


Jz) = 


g (x) = 


§ 21 Eigenvalue Problems 433 


The second derivative y"(x) and all the higher derivatives are then 
determined by the defining differential equation (1). 

To test the accuracy of our solution, we apply it to the special 
value A = —2, in which case the explicit solution of our problem 
becomes 


y(x) = (e — e), y£) = e + 2e) 


x(144 + 24x2) 


WAE 9) = i44 Tae — 12a? — 12a F 4a 


144 — 72x + 132r? — 1203 + 4xt 


F) = Tag Te — 12a — 128 4a 


At the point x = 1 the correct solutions are 
y(1) = 0.8610,  y'(1) = 0.9963 
while the approximation gives 
g(1) = 0.8571, 9A) =1 


We see that the accuracy is very satisfactory. 

Generally, if this quadrature method is applied to an analytical 
solution of a given boundary value problem in the realm of ordinary 
differential equations, our procedure can be described as follows. 
First the given realm is normalized to [0,1] by a proper scale 
transformation. Then we make a list of the boundary values 


y(0), y’(0), y”(0), ... 
and y(1), y’(1), y"(1), «- 


up to the point where the higher derivatives are already determined 
by the given differential equation. Some of these boundary values 
are prescribed on account of the given boundary conditions. The 
others are replaced by letter symbols. If the differential equation is 
homogenous, one of the boundary values can be normalized to 1. 
Now the quadrature formula (16) comes into operation, establishing 
a linear relation between the unknown boundary data. Repeating 
the quadrature formula for derivatives of lower and lower order, 
eventually all the boundary data become determined, on the basis 
of a system of linear equations. If an eigenvalue A is involved, 


434 Quadrature Methods Chap. VI 


elimination of the unknown boundary data results in an algebraic 
equation for A, whose absolutely smallest root we retain. 

At this point we have an initial value problem, since y(0) and all its 
higher derivatives at x = 0 are already known. Now the trans- 
formation x = «é follows, considering « a given constant parameter. 
Applying the quadrature formula in the previous manner to the new 
differential equation, we eventually obtain y(«), and independently 
y(x), y’(«), ..., from the given initial values. 

As a valuable check we can proceed similarly from the other end 
point x = 1 and compare the new y(«), obtained in terms of the 
end values y(1), y’(1), + , with the previous y(«) obtained in terms of 
the initial values y(0), y’(0), =. The degree of agreement will give a 
good measure of the accuracy of the solution. 


22. Convergence of the quadrature based on boundary values. We 
will now investigate the general convergence properties of this 
quadrature procedure. For this purpose we will use a method 
which is very similar to that utilized before, when dealing with the 
convergence of equidistant polynomial interpolation, (cf. V, 15). 
We assume the analytical character of f(x) in the interval [0,1] and a 
certain domain of the complex plane surrounding this interval. 
Then we can represent f(x) with the help of Cauchy’s loop integral 
(5-15.9). This makes it possible that the entire investigation shall be 
restricted to the special function (%) — xy} where 2, is some fixed 
point of the complex plane. If we can show that the quadrature 
method converges for this particular function, provided that zo 
stays outside of a certain well circumscribed domain of the complex 
plane, then the convergence is assured for any analytical function 
f(z) which stays analytic inside and on the boundaries of that 
domain. 

In (19.9) the following form of the remainder was developed: 


i= = ai f 2x) [a1 — x)]” dx (6-22.1) 


For the special function: 


f@)= 


(6-22.2) 


§22 Convergence of Quadrature Based on Boundary Values 435 


we get 
fa) l 
On)! (%@— a ee 
and thus 
= =c f aS (6-22.4) 


0%—2 (2) — x)? 
Our aim is to show that with n increasing to infinity, 7,, converges to 
zero. We notice that the decisive quantity is the absolute value of 
the ratio whose nth power appears in the integrand: 


(1 — x) 
a= (2 — x)? 
If A(x) stays everywhere smaller than 1, the gradual decrease of 
Nn tO Zero is assured. 

Now it is possible that z is so far away from the interval [0,1] 
of the x-axis that A(x) remains smaller than 1 throughout the given 
interval, in which case the convergence is already guaranteed and no 
further discussion is demanded. But it is also possible that zo 
approaches the x-axis to a degree that A(x) grows beyond 1 in a 
certain portion of the critical interval. Let us assume that z, has the 
form 


(6-22.5) 


Zo =a — ip (6-22.6) 
Where £ is positive. Hence z) is some complex point below the 
x-axis. We start from the point x = 0 where A(z) is zero and proceed 
up to the point P, where A(x) becomes 1. Similarly we start 
symmetrically from the point x = 1 (where A(z) is also zero) and go 
backward until a point P, is reached where again A(x) becomes 1. 
We have difficulty only between the points P, and P, since the 
contribution of the interval OP, and P,1 converges to zero. But now 
we will make use of the well-known property of analytic functions 
that the path of integration can be deformed in any way we like, as 
long as singular points of the integrand are avoided. Hence we will 
replace x by the complex variable z = x + iy and choose our path 
somewhere in the upper half of the complex plane in which the 
integrand is free of any singularities. We first establish the geo- 
metrical locus of all the points in which A(z) becomes 1. This gives 
the condition 


la — x) + y*P? + A — 22)?y? = [(« — 2)? + B+ y) (6-22.7) 


436 Quadrature Methods Chap. VI 


which may be conceived as a cubic equation in y with the cubic term 
4By* and the absolute term 


Ke — x) + PP — ea — x)? 


In the critical interval between P, and P, this quantity becomes 
negative. But then the cubic equation must have a real positive root 
for every x of the critical interval. We thus obtain in the upper 
half-plane a definite convex curve between P, and P}, inside of which 
A(z) is larger than 1, outside of which A(z) is smaller than 1. By 
choosing our path of integration outside of the critical curve, we now 
have a finite path along which A(z) < 1 and thus we have demon- 
strated that 7, converges to zero. 

The only case we have not covered yet is the choice p = 0. Then 
Z lies on the z-axis and the integrand is real. We consider the ratio 


The maximum occurs at the point 


pee x 
~~ Qa —-1 
At this point 
1 
a= e 


The condition that this ratio shall remain smaller than 1 gives two 
bounds for «, viz., on the positive side: 


anm i be — 1.20705 (6-22.8) 
and on the negative side: 
a < : a = —0.20705 (6-22.9) 


The result of our investigation can be summarized as follows. 
We extend the z-axis symmetrically on both sides by the amount of 
0.207 beyond the terminal points 0, 1. This line can be surrounded 


§ 22 Convergence of Quadrature Based on Boundary Values 437 


by a fence of arbitrarily small width. If f(z) is analytical in this 
infinitesimal domain, the convergence of the quadrature formula is 
secured. 


a | 
-0.207 O 1 1207 


Bibliographical References 
[1] Cf. Ref. {11}, Chapter IX. 


[2] Lowan, A. N., Davips, N., and LEVENSON, A., Table of the 
zeros of the Legendre Polynomials of order 1-16 and the weight 
coefficients for Gauss’ mechanical quadrature formula, Bull. 
Amer. Math. Soc., 48, 739 (1942). 


[3] Mine, W. E., Numerical Solution of Differential Equations 
(Wiley, New York, 1953). 


VII 


POWER EXPANSIONS 


1. Historical introduction. The powers of a variable x appeared 
originally purely in algebraic problems. With the development of 
calculus the great importance of power expansions became evident. 
The expansion discovered by Taylor (1715) and by Maclaurin (1742) 
enables us to predict the course of a function if we know the value 
of the function and all its derivatives in one particular point. The 
“Taylor series” thus became one of the cornerstones of analytical 
research and was particularly useful in establishing the existence of 
solutions of differential equations. From the middle of the nineteenth 
century on, a more cautious attitude toward power expansions 
became noticeable. The mere existence of a Taylor expansion does not 
prove that this series has any inner affinity to the function it represents. 
If, on the other hand, the series has no other purpose than numerical 
evaluation of the function, the degree of convergence has to be 
investigated. The Taylor expansion may converge in the entire 
complex plane or within a given circle only, and it may diverge even 
at every point. It was recognized, however, that a more liberal 
formulation of the question of convergence greatly increases the 
usefulness of an expansion. One can make good use, for example, of 
“semiconvergent expansions” which actually diverge if we increase 
the number of terms to infinity, but converge in the beginning, thus 
allowing evaluation of the function with a certain limited accuracy 
which cannot be surpassed, since the error of the truncated series 
decreases to a certain minimum and then increases again. Much 
attention was paid also to the problem of inventing methods of 
summing a series in such a way that it shall become convergent, 
although the original series, if added term by term, increased to 
infinity. 

438 


§ 1 Historical Introduction 439 


With the development of the theory of orthogonal expansions the 
realization came that occasionally power expansions whose co- 
efficients are not determined according to the scheme of Taylor can 
operate much more effectively than the Taylor series itself. Such 
expansions are not based on the process of successive differentiation 
but on integration. A large class of functions which are not sufficiently 
analytic to allow a Taylor expansion can be represented by such 
orthogonal expansions. The realm of power expansions is thus 
extended far beyond the family of analytical functions. But even for 
analytical functions we may gain in convergence if we do not employ 
the powers directly but in the form of polynomials which are members 
of an orthogonal set of functions. These expansions belong to a 
given definite real realm of the variable x, and our aim is to approxi- 
mate a function in such a way that the error shall not become too 
small or too large at any particular point of the range, but rather of 
the same order of magnitude all over the range. The gain in com- 
parison with the Taylor series arises from the fact that we sacrifice 
in accuracy at the point where the Taylor series gave very accurate 
results but reduce the error in the peripheral] regions where the error 
of the Taylor expansion became intolerably large. 

The theory of orthogonal expansions is based on the theory of 
least squares developed by Gauss and Legendre. Many orthogonal 
function systems were known through the study of the Laplacian 
operator (potential equation) which played a fundamental role not 
only in astronomy but also in electricity and magnetism. The 
Fourier series was the first classical example of an orthogonal expan- 
sion. However, the nature of orthogonal expansions was not 
recognized until the great acoustical investigations of Lord Rayleigh 
(Theory of Sound, 1894), who has to be considered as the true founder 
of the theory of orthogonal expansions. Fredholm’s integral 
equation (1900) and the subsequent investigations of Hilbert (1910) 
gave a tremendous impetus to the broader understanding of the 
problems of orthogonality and their relation to metrical geometry. 

In the present chapter we will be particularly interested in the 
question of rapidly convergent power expansions. Mere convergence 
of an expansion, valuable as it is from the purely analytical stand- 
point, is of little practical use if the number of terms demanded for a 
reasonable accuracy is very large. Methods can be developed, 
however, by which a long power expansion may be “‘telescoped”’ into 


440 Power Expansions Chap. VII 


a much shorter series of only a few terms. A finite power series can- 
not be replaced by any other series if absolute accuracy is required. 
If, however, we are satisfied with a given limited accuracy of, let us 
say, 5%, it can easily happen that a long power expansion of, let us 
say, 50 terms may be replaceable by another expansion of only 5 
terms. This “telescoping” of a power series is not obtained by 
merely omitting the terms from a certain point on, but by rearranging 
the series according to a certain pattern. Another possibility is 
that we make use of the differential equation which defines a certain 
function and take care in advance that in solving this differential 
equation by a finite power expansion the errors shall be evenly 
distributed over the entire range. In these investigations a certain 
outstanding class of polynomials, introduced by the Russian mathe- 
matician Chebyshev (in 1864) and called “Chebyshev polynomials,” 
play a fundamental role. 


2. Analytical extension by reciprocal radii. The Taylor expansion 
of an analytical function w = f(z) has a certain radius of convergence 
which is determined by the analytical nature of the function f(z). If 
f(z) becomes infinite at a certain point z = z, or has some other 
“singularity” at that point which is not in harmony with the general 
conditions of an analytical behaviour, the infinite series 


w = a + az + azz? +e (7-2.1) 


cannot converge beyond the circle | z | = | zo |- But the series (1) will 

converge within that circle if we know that z = 2, is the nearest point 

where the analytical behavior of f(z) is violated. On the circle itself 

the series may or may not converge, and requires special investigation. 
As an example we consider the two functions 


(a) w= e” (1-2.2) 


1 
(0) "The 
The first function remains finite and analytical at all points of 
the complex plane z = x + iy. Hence the Taylor expansion around 
the origin must remain convergent for any finite value of z. A 


§ 2 Analytical Extension by Reciprocal Radii 441 


function of this kind is called an ‘“‘entire function.” The second 
function becomes infinite at the points 


z = +ri, +37i, +5ai,- 


The nearest singularity is at the point z = +ir. This decides that the 
Taylor expansion (2.1) of this function converges only inside the 
circle | x + iy | = 7. 

It is frequently possible to obtain the coefficients of the Taylor 
expansion on the basis of the defining equation, without going through 
the process of successive differentiations. For example, the equation 
(2b) may be written as follows. 


(1 + e*7)w=1 
and expanding both factors in a Taylor series, we have 


z2 28 
(2+ z + 7| +z T =) (ay + az + a2? + =) = 


If we carry through the multiplications term by term, we obtain 
successive recursions for the determination of the a,: 


2a, = | (a = 3) 
2a, + a = 0 (a, = —) 
2a, + a + a = 0 (a, = 0) 
2as + a +3 Z=0 (a, = 48) 


Since, however, our series is not valid outside of a definite circle, 
the question can be raised whether by some method we could not go 
beyond the circle of convergence. One method of great analytical 
but little practical significance is the method of “analytical continua- 
tion” by local expansions. In this method we choose a point still 
inside the radius of convergence, e.g., the point z = $m. On the basis 
of the Taylor series we obtain the function and all its derivatives at 
that point and now choose this point as the center of a new Taylor 
expansion. The radius of the new circle of convergence is again 
determined by the nearest singularity, 1.e., the points z = +i, and 
we see that we now cover ground which was not included in the 


442 Power Expansions Chap. VII 


previous circle. By repeated applications of this method we can 
extend the definition of a function, originally given in terms of an 
infinite expansion (1), and restricted to a definite circle, to larger and 
larger portions of the complex plane. 

A numerically much more convenient method can be derived from 
the “transformation by reciprocal radii.” Many of the important 
transcendental functions of applied analysis have the property that 
they are analytical in the entire right complex half plane, although 
they have a singularity at the point z = 0. Let us assume that we 
possess the Taylor expansion of this function about a certain point 
z = p, where p is a positive real number. 


w = f(z) = a + a(z — p) + a(z — p? + °° (7-2.3) 


This expansion converges for all z which are inside the circle | z | = p 
We now apply the transformation 


bees 
=o : (7-2.4) 
This transformation has the property that the entire right half plane 
of the variable z is mapped inside the unit circle | £ | = 1 of the new 
variable &. Hence the function w := f(z), considered as a function of 
the variable &, has no singularity inside the unit circle, and allows 
there an expansion of the form 


w = by + bE + bf? + (7-2.5) 


The radius of convergence of this expansion is | é| = 1. Hence the 
series (5) allows evaluation of w(£) with arbitrary accuracy at any 
point é whose absolute value is less than 1, and that means any 
point 2 which is in the right complex half plane. 

Now the series (3) and the series (5) are in close relation to each 
other. We have 


E 
— — 2 1-2.6 
2—p= P (1-2.6) 
We write this transformation in two steps: 


z—p=2pt and t= = (7-2.7) 


§ 2 Analytical Extension by Reciprocal Radii 443 
The first transformation gives 


w = Ay + 2payt + (2p)?agt? + ++ = ay + ayt + azt? + 0e 
(7.2-8) 
where 
a, = (2p)*a, (7-2.9) 


The second transformation means that the term a,t* is to be replaced 
by an infinite series: 


, , eR o œ fk+a-—1 5 
EA } 


, | /m—l1 
=a, > ’ 7 |i (7-2.10) 


This shows that the rearrangement of the series (3) into a series in $ 
gives the following values for the coefficients b,. 


kk 
bie = >| ae | (7-2.11) 
a=0 


or written out in detail 


by = ap 

b =a, 

by = a, + a, 

bs = a, + 2a, + az (7-2.12) 


b, =a, + 3a, + 3a, + a, 


The coefficients of the expansion (5) are thus available by a simple 
binomial weighting of the coefficients a,. 

We have thus extended the validity of a local expansion to a much 
larger range. The original expansion (3) converged only within the 
circle | z | = p and was unsuitable for evaluation of f(z) at any point 
z which lies outside of that circle. We now transformed these 
coefficients by two simple operations. First we multiplied the succes- 
sive coefficients by the successive powers of 2p. Then we weighted 
these coefficients, starting with a,, binomially. The new coefficients 


444 Power Expansions Chap. VII 


give us a new series by which f(z) can be evaluated at any point z 
which lies somewhere in the right half plane. The variable of this 
new series is not the original z but 


PEEL A 
z+p 


which for all permissible z remains in absolute value smaller than 
(or in the limit equal to) unity. 

This method can be useful in many problems. It so happens that 
almost all the fundamental transcendentals encountered in the 
mathematical description of physical events—such as the Bessel 
functions, the Gaussian error function, the gamma function, the 
exponential integral—have the property that the bulk of the function 
can be approximated by a simple exponential function, and what 
remains is an amplitude factor which does not change radically for 
any point z of the complex right plane. This amplitude factor is still 
analytical throughout the complex half plane, and becomes singular 
only at z=0 and z = œ. Moreover, the defining differential 
equation or functional equation permits us to obtain the successive 
derivatives of this amplitude function at a suitably chosen point z = p. 
We now have a method by which that function can be analytically 
represented and numerically evaluated for all the z points of interest, 
in terms of the coefficients of that single expansion. 


(7-2.13) 


3. Numerical example. We demonstrate the operation of the 
method by applying it to a simple but characteristic example, viz., to 
the “exponential integral’: 


0 pt 
w(z) = Í a dt (7-3.1) 
The defining differential equation here is 
e`? 
w'(z) = — — (7-3.2) 


Z 


The method of integrating by parts reveals that for large values of z 
we will have practically 


e? 


w(z) = 


2 


§ 3 Numerical Example 445 


Hence we will put 
a 


ae (7-3.3) 


w(z) = : 


and consider u(z) our unknown function: 
u(z) = ze*w(z) (7-3.4) 


By substitution into (2) we obtain the defining equation for u(z): 
uw’ —-—u=—] or —zu' + (1 + zju =z 


We want to expand about the point p = 1. Hence we put 
z=l1 +t, —(1 + tju + (2 + tju =1 +t (7-3.5) 


If we substitute in this equation for u the expansion 


u = mq + at +H akt? He (7-3.6) 
and tabulate the coefficients, we obtain in succession 
a = —1 + 2a, a; = $(a3 — 244) 
a, = 3(—1+4+ a), Ag = (a, — 345) 
az = $a, a, = (a; — 4ag) 
a, = (az — as) (7-3.7) 


All coefficients are thus uniquely determined if a) is given. We take 
the value of a, from a table.+ 


© p—t 

dme Í = dt = 0.596347361 
1 

Successive substitutions give 


a,= 0.192694722, a; = 0.029817368 
a, = —0.105478958, a, = —0.021979956 
aà = 0.064231574, a, = 0.016819599 
a, = —0.042427633 EOR ek EA a t (7-3.8) 


The next step is to multiply by the successive powers of 2p, i.e., in our 


1 See footnote to IV, 21. 


446 Power Expansions Chap. VII 


case by powers of 2. These a; coefficients are then weighted by the 
binomial coefficients: 


a,= 0.385389444 111141 1 1 
a, = —0.421915834 1234 5 6 
a= 0.513852592 1 3 6 10 15 
a, = —0.678842130 1 4 10 20 
a; = 0.954155779 1 5 15 
a, = —1.406717197 1 6 
d, =  2.152908672 1 (7-3.9) 


The resultant series (truncated to 8 terms) becomes 


u(é) = 0.5963 + 0.3854& — 0.0365é? + 0.05543 
—0.0176&# + 0.01965 — 0.01006 + 0.009787  (7-3.10) 


This series has slow convergence if we approach the unit circle 
|é|= 1. An approximation of this kind is not characterized by 
excessive accuracy, and we possess other methods for approximation 
of the exponential integral, which give much higher precision. But 
these other methods make use of the differential equation which 
defines the function u(z), while the expansion by reciprocal radii is 
available under much more general conditions. The value of the 
method lies in the fact that by a simple weighting of a local Taylor 
expansion a new expansion is obtained which represents the desired 
function u(z) in a large domain of z by a simple analytical expression 
of moderate accuracy. 

The maximum error of the expansion (10) can be estimated as 
follows. The singularity of u(z) lies at the point z = 0, i.e., E = —1. 
Here u(z) does not go to infinity but to zero with the strength 
z log z. We can thus expect that the series does not fail even on the 
limiting circle | | = 1, which corresponds to the imaginary axis of 
the variable z. The convergence will be slowest for the singular point 
é = — 1. Here the series (10) gives 


u(—1) = 0.0619 


Hence we can estimate that the absolute error of u(é) will nowhere 
surpass 0.062. 


§ 4 The Convergence of the Taylor Series 447 


As a numerical experiment let us obtain from our approximation 
the value of w(z) at the point z = i. The corresponding value of £ is 


We are thus on the limiting circle. Substituting in (10), we obtain 
.u(i) = 0.6253 + 0.33987 


and going back to the original function w(z) according to (3): 


wi) = —Ci(1) — i (= = sit) 


I , 
= — (cos 1 — isin 1)u(z) 
i 


= —(0.6253 + 0.3398i)(0.8415 + 0.5403!) 
= —0.3425 — 0.6238! 


from which Ci(1) = 0.3425, Si(1) = 0.9470 
while the actual values are 
Ci(1) = 0.3374, Si(1) = 0.9461 


We thus get an accuracy of 1%. 


4. The convergence of the Taylor series. The coefficients of the 
Taylor series are obtained by successive differentiations at a definite 
point z= 2, the center of expansion. Since the derivatives of 
a differentiable function are all limits of difference coefficients, taken 
for an h which converges to zero, we see that the coefficients of the 
Taylor series are uniquely determined if the function is known in an 
arbitrarily small neighborhood of the point z = z. We thus obtain 
the remarkable result that the course of an analytical function can be 
predicted by knowing the function only in the small. The Taylor 
series is thus an extrapolating and not an interpolating series. Since 
interpolation is always safer than extrapolation, we would think 
that a series based on interpolation would give much better results, 
due to stronger convergence, than the Taylor series. While this 
viewpoint is correct, as the following discussions will demonstrate, it 
is true only if we are interested in a one-dimensional range of the 
variable z; the situation is different, however, if we pay our attention 


448 Power Expansions Chap. VII 


to the complete two-dimensional area included by the circle of 
convergence. In this range the validity of the Taylor series is derived 
from Cauchy’s integral theorem: 


fo, 
so=5 d (1-4.1) 


where the contour integral is extended over the boundary values of 
f(z) along the limiting circle. In this formulation the values of f(z) are 
known inside a circle if the boundary values on the circle are pre- 
scribed. It is only because of the analytical nature of f(z) that this 
integral is replaceable by an infinite convergent power expansion 
whose coefficients are obtainable by the successive derivatives of 
f (2 at the center of the circle. If z — 2 is written in polar form: 


z—2 = re” (7-4.2) 


the Taylor series becomes 


a 


f@= > ar i (1-4.3) 


k=0 


Let us fix our attention on the values of f(z) along a circle of fixed 
radius r = rọ Then f(z) appears in the form of a Fourier series in 
the variable 0. But this series is already an orthogonal expansion 
which approximates in the “best”? way in the sense of least squares. 
Hence we cannot hope to improve on the convergence of the Taylor 
series, if our aim is to obtain a valid expansion everywhere inside of a 
circle. Given the condition that f(z) shall be approximated by a 
power series of mth order, where n is a prescribed positive integer, and 
given a circular analytical domain in which our expansion shall hold, 
we can find no series which would be preferable to the truncated 
Taylor series by giving smaller errors throughout the given domain. 


5. Rigid and flexible expansions. The great analytical value of 
the Taylor series and the admirable properties of the Fourier series 
led to overemphasis of a definite kind of limit procedure. In this 
procedure we consider an infinite series of the following kind. 


f) = aps) + apax) + (7-5.1) 


The meaning of this ‘‘dot-dot-dot writing” is not immediately clear, 
since addition of an infinite number of terms cannot be taken 


§ 5 Rigid and Flexible Expansions 449 


literally. By tacit assumption we agree that neither the “equal” 
sign nor the summation on the right side shall be meant in the 
algebraic sense. The right side points out a successive procedure, 
while the equal sign refers to a certain relation of this procedure to 
the given f(x). The procedure is as follows. Take n terms of the 
series for a given x. 


Sn = A, Py(2) + apax) + °° + an9,(2) (7-5.2) 
Then add one more term: 


Snit = Sy == Ant Pn) 


Then again add one more term: 


Sn+2 = Sn4y F AnP nla) 


Proceed in this fashion, always adding one more term. This defines 
the procedure for any x we may choose. 

The equal sign between the left and the right side expresses two 
independent facts: 

1. The sums S,,, Sn+1» Sn+2 °° approach a definite limit. This means 
that there exists a certain number /(x) which is approached by the 
S,(z) in such a way that eventually, for sufficiently large n, the 
difference between /(x) and s,(x) can be made in absolute value as 
small as we wish, although generally not zero. This (x) is called the 
“limit of s,,(x), as n increases to infinity.” 

2. In a certain given interval of x this (x) coincides with f(z). 

While an infinite limit process of this kind is very valuable and 
received historically overwhelming eminence, it is nevertheless 
characterized by one serious drawback which ties our hands consider- 
ably. In going fromnton + 1,n + 2, --- we constantly add one more 
term, without changing anything on the sum we have obtained before. 
This puts a heavy responsibility on us when we write the term 
a,P,(x), because we are not going to use that term only for the 
s,(x) when we first encounter it, but for all the later s,,,(x), to all 
eternity. It is conceivable that we could gain much more in effective 
approximations if we were entitled to do two things in going from n 
ton-+ 1: add one more new term and change all the coefficients we 
had before. In this case the one-dimensional “rigid” sequence of 
coefficients 

Oy, My p, Ags 


450 Power Expansions Chap. VII 


would be changed to a two-dimensional sequence, since now the 
coefficients of the expansion (1) will depend on n, the number of 
terms. Hence now we obtain a matrix scheme of the following 
structure. 

ay 

Aas Ang (7-5.3) 


Q31, 39, 33 


The sum s,(x) is now to be formed as follows. 


Spx) oo An Pi) F a n2Po(2) zi iii T AnnP nl) (7-5.4) 


Again we proceed from n ton + 1, ton + 2, + , and again our aim 
is that s,(z) shall approach f(x) with ever-increasing accuracy. 
Again we can say that 


f(x) = lim s,,(z) (7-5.5) 


This statement is identical with the statement of equation (1) but 
the simple shorthand by which the previous statement could be put 
in symbols is no longer at our disposal. 

The use of such “‘flexible” coefficients extends the power of 
infinite expansions very noticeably. By adjusting the coefficients to 
the number of terms we may reduce the error of the finite series for 
the same number of terms. Moreover, we may expand functions into 
infinite series which are not expandable with the help of rigid 
coefficients. 

The value of such expansions was already demonstrated when we 
were dealing with the differentiation of a Fourier series (cf. IV, 6). 
We have seen how advantageous it is to smooth out the Gibbs 
oscillations by replacing f(x) by a modified f(x), obtained by taking 
the arithmetic mean of f (x) between the points x — mjn and x + m/n. 
The width of this integration process changes with n. Hence the 
function we are expanding changes constantly with the number of 
terms, and thus the coefficients of the expansion also change. 
Nevertheless, as n increases to infinity, f,(z) approaches the given 
f(x) more and more, and so does the Fourier expansion, which comes 
nearer and nearer to fn). A function like y = tan z, which is not 
integrable and thus not expandable by the method ofrigid coefficients, 
becomes immediately expandable if flexible coefficients are used. 


§ 6 Expansions in Orthogonal Polynomials 451 


The greatest importance of flexible coefficients for parexic purposes 
lies, however, in the fact that the coefficients of an orthogonal 
expansion require evaluation of certain definite integrals which are 
frequently not at our disposal. Hence it is of great value to know 
that such an expansion of n terms may be replaced by another 
expansion of n terms, using the same n functions but with different 
coefficients which can be evaluated more easily. And yet the error of 
this modified expansion may not be essentially worse than the error 
of the classical expansion, with practically unavailable coefficients. 


6. Expansions in orthogonal polynomials. In V, 20 we have 
discussed an important family of polynomials, called Jacobi poly- 
nomials. These doubly infinite family of polynomials differ from 
each other according to the weight function with respect to which the 
orthogonality holds. But whatever this weight function is (its 
positiveness being assured), an arbitrary function, restricted by very 
few conditions (absolute integrability suffices if Fejér’s summation 
method is employed, cf. IV, 2), can be expanded in these polynomials 
and thus they all share the same analytical properties. Is there any 
reason why we should make a choice in favor of a special weight 
factor when all these polynomials have the same capacity in represent- 
ing arbitrary functions? 

The differences reveal themselves at once if we inquire deeper into 
the meaning of the word “‘represent.”” We can be easily deceived 
if we endow the concept of a “‘limit” with magic significance. We 
may be inclined to believe that the expression “‘the limit of an 
infinite expansion becomes f(x)” means that a proper combination 
of given functions eventually yields f(x). But in actual fact this 
“eventually” never occurs because it is not possible to add up an 
infinity of terms. What is meant is only that adding up more and 
more terms we can approach f (x) as near as we wish, provided that 
we do not say: “I want to get f(x) exactly.” Hence an error is 
admittedly tolerated and what we claim is only that this error can 
be diminished to an arbitrarily small amount. 

Now, if somebody prescribes an arbitrarily small error limit +e, 
the pure analyst is satisfied in showing that the sum of a sufficiently 
large number of terms, constructed according to a certain scheme, 
will match the given limit. He can be lavish with the number of 
terms and it makes no difference to him whether 20 or 10,000 terms 


452 Power Expansions Chap. VII 


are needed for his purpose. The viewpoint of the applied analyst is 
markedly different. Given a certain error limit +e, his aim will be to 
economize as much as possible in the number of terms which can 
match that limit. 

Here is where the various classes of orthogonal polynomials 
diverge sharply in their behavior. It is entirely possible that a 
certain kind of polynomials will attain with 20 terms the accuracy 
for which another kind of polynomials may need 10,000 terms. 
This is understandable if we investigate the role of the weight 
factor p(x). Let us assume that p(x) is such that it is large in the 
immediate neighborhood of zero but then drops to almost 
negligible amounts. This means that we take the immediate neigh- 
borhood of zero seriously, at the expense of the rest of the interval. 
Now a finite sum of such an expansion will show the following 
behavior. It will give f(x) with great accuracy around the origin 
but in the rest of the interval the accuracy will be greatly diminished. 
Since, however, the error limit +e shall mean that the deviation of 
the series from f (x) shall not surpass +e at any point of the interval 
+1, we see at once that the unfortunate nature of the weight factor 
compels us to pile up a large number of terms, since it will take a long 
time before the accuracy e is reached at the two ends +1 of the 
interval. 

The Taylor expansion around the point x = 0 actually operates 
in this uneconomical fashion since we take all our information from 
the functional values infinitely near to x = 0. This explains why the 
Taylor series, in spite of its tremendous analytical significance, is 
frequently of little use in applied problems and can be replaced by 
much more useful expansions. 

Let us now focus our attention on the Jacobi polynomials. We 
have a doubly infinite family of polynomials (cf. 5-20.5), 
characterized by the two parameters y and 6. However, we will 
seldom be interested in a “lopsided” weighting in which the left 
and the right sides of the interval [—1, +1] are weighted differently. 
If we restrict ourselves to symmetric weighting, the Jacobi poly- 
nomials are at once reduced to the “ultraspherical polynomials” 
(5-20.7) which depend on one parameter y only. This parameter can 
still vary between 0 and o0. The case y = œ corresponds to the 
aforementioned overemphasis of the point x = 0 at the cost of all 
other points and thus leads to the Taylor series. The case y = 1 


§ 6 Expansions in Orthogonal Polynomials 453 


leads to the Legendre polynomials for which the most natural 
weighting p(x) = 1 is realized. Here every point of the interval +1 
is taken equally seriously. And yet this is not necessarily the best 
choice. 

If we want to investigate the error of a truncated series, we do not 
go far wrong if we estimate the error by the first neglected term of 
the series. Let us expand f (x) in Legendre polynomials P,(x). Then 
the error of 7,,(x) of a finite expansion of n terms will be approxi- 


mately 
7} n() a C,P.,() 

Now what is the nature of P(x)? It is a function which oscillates 
back and forth around zero, with variable amplitudes. The ampli- 
tudes steadily (although slowly) increase as we leave the mid-point 
of the range and reach the maximum 1 at the two endpoints x = +1. 
Hence the error is not quite evenly distributed over the range. In 
order to assure that 7,,, does not surpass +e even at the endpoints +1, 
we have to pay a price by a too large number of terms, because we 
have obtained more than we wanted in the inside of the range. It is 
similarly of no advantage if the amplitudes of the error oscillations 
constantly decrease, as we go from x = 0 towards x = 1. In this 
case we get more than what we want at the periphery of the interval 
and again we have to pay the price. ; 

Now the ultraspherical polynomials P}(x) have the following 
behavior. If y has any value between oo and 3, the amplitudes of 
the successive oscillations steadily increase as we proceed from 
x = 0 to x = 1; if y has any value between 4 and 0, they steadily 
decrease. The limit y = 4 is thus of special interest. Here the 
amplitudes remain constant throughout the interval. Hence we have 
distributed the errors in the most advantageous fashion, namely 
uniformly over the entire range. The number of terms now needed 
for a certain accuracy +e cannot be diminished. We have obtained 
maximum convergence inasmuch as the smallest number of terms is 
reached by which an approximation of f (x) can be achieved which 
does not deviate from the true value by more than +e at any point of 
the given range [—1, +1]. These polynomials, called after the 
Russian mathematician Chebyshev, are actually the most elementary 
(and historically oldest) set of polynomials, because of their simple 
relation to the elementary trigonometric functions cos nô. 

1 See [4], p. VIII. 


454 Power Expansions | Chap. VII 


7. The Chebyshev polynomials. Apart from the feature of maxi- 
mum convergence, the polynomials 7;,(z) (cf. 5-20.12) have many 
other valuable properties which make them a particularly interesting 
class of functions. Their most valuable property is the fact that they - 
are expressible in terms of elementary trigonometric functions. They 
are nothing but the simple trigonometric functions cos k0, but 
expressed in the variable 


x = cos 0 (7-7.1) 


Consider an arbitrary integrable function of bounded variation 
y = f(x). Substituting cos 6 for x, we get the function 


y = f (cos 0) (7-1.2) 


which is now a periodic function of the angle variable 0 in the 
interval [—7z, +7]. Since a change of 6 to —0 does not modify 
the value of the function, f (cos 0) is an even function of 0 which may 
be expanded in a Fourier series 


f (cos 0) = ła + a cos 0 + a, cos 20 + + (1-7.3) 
with 


a, = a) f (cos 0) cos k d0 (7-7.4) 


This same series, viewed from the standpoint of the variable x, 
appears in the form: 


fæ) = $4) + a,T,(%) + aTa) + °° (7-7.5) 
with 
2 ais dx 
am L SOT) aa (7-7.6) 


This is a special case of the expansion into ultraspherical poly- 
nomials (cf. § 6), for the choice y=4, which characterizes the 
Chebyshev polynomials. 

Expansion of a function into Chebyshev polynomials is thus a 
mere reinterpretation of the expansion of an even function into a 
cosine series. This fundamental relation, which translates the out- 
standing properties of the Fourier series into the realm of power 
expansions, is the most important property of the Chebyshev 
polynomials. 


§ 8 The Shifted Chebyshev Polynomials 455 


For numerical applications, the integer nature of the Chebyshev 
polynomials and their simple relation to the binomial coefficients 
are convenient. The trigonometric formula 


cos (k + 1)6 + cos (k — 1)0 = 2 cos 0 cos k0 


gives in translation the recursion formula for the Chebyshev poly- 
nomials! 


Tr) = 2xT, (2) — Tril) (7-7.7) 
starting with 
T(oa=1, TTw=2 (7-7.8) 


It leads throughout to integer coefficients. In fact, the relation (7) 
shows that the coefficient of x” must be divisible by 2”—} and still 
remain an integer. If we write 


T(E) = 8 + che +--+ + chat (7-7.9) 


we obtain for the coefficients the expression, 


sora pita N] on 


with the understanding that any coefficient for which k + m is odd 
is put equal to zero. The first ten T,(x) are tabulated in Table V of 
the Appendix. 


8. The shifted Chebyshev polynomials. In many problems of 
applied analysis, normalization of the mid-point of the range to 
x = Q is highly inconvenient. We frequently want to expand a 
function around the point x = 0, but are interested only in the 
positive range. Or we may want to expand about the point x = oo, 
using reciprocal powers, which is again reduced to the previous 
problem if = 1/z is introduced as a new variable. Here again the 
analytical natures of f (€) to the left and to the right of = 0 are 
usually completely incongruous and should not be treated together. 
Hence the normalization of the range to [0,1] is frequently much 
more convenient than the previous normalization [—1,+1]. This 


1 For further algebraic and functional properties of the 7,(x), cf. [4], 
Introduction. 


456 Power Expansions Chap. VII 


is particularly true if we want to utilize the “parametric method,” 
encountered earlier in VI, 21 [cf. (6-21.12)]. 
We put 

£ = any (7-8.1) 


and make the point z = « correspond to the point z, = 1 in the new 
variable. Although « is treated as a constant during the problem, 
it can be identified with any complex value z. Then, by taking the 
function at the end point z, = 1 of the range, we have actually 
obtained f(z) in the original variable. Here again it is necessary that 
the variable z, shall be normalized to the interval [0,1]. 
The renormalization of the 7,(x) to the new range is quite simple. 
We now put 
cos 0 = 2x — 1 (7-8.2) 
or 
1 + cos 0 
¢ = —————_ =C 
2 


While 0 varies from 0 to 7, x varies from 0 to 1. The Chebyshev 
polynomials are again defined by 


0 
2 7-8.3 
OS 5 ( ) 


T; (x) = cos k0 (7-8.4) 


but now expressed in the new variable (3). The new polynomials 
have entirely different coefficients, although they would be obtainable 
from the previous T(x) by a simple substitution. 


Ty (x) = Tx — 1) (7-8.5) 


These 77 (x) are now, apart from a sign, directly equal to a hyper- 
geometric function [cf. 5-20.12]. 


Tg (Œ) = (—1)*F(—k, k, 4; 2) (7-8.6) 


If 7; (x) is again written in the form (7.9), the coefficients c? now 
become 


Pemi 
c” = 22m- |2 (kEm) = ( ne 7 ) (—1)"™** (1-8.7) 


These polynomials are no longer alternately even and odd functions. 
The coefficients of all orders are present between 0 and k, with 
alternating + signs. The first ten T*(x) are tabulated in Table VII 
of the Appendix. 


$9 Telescoping Power Series by Successive Reductions 457 


The polynomial TŠ (x) requires special attention. According to the 
general definition (4) we must put Tj (x) = 1. It so happens, however, 
that in many summation formulas involving Chebyshev polynomials 
the polynomial Tğ(x) enters with half weight only and has to be 
treated separately. In order to avoid this inconvenience, we will use 
the following convention. We write Tj (x) = 1, but we put 


r=) (7-8.8) 


9. Telescoping of a power series by successive reductions. The 
discussions of § 6 have shown that the same function f(x) may be 
expanded into a whole spectrum of different power series. They all 
represent the same function, but with very different degrees of 
convergence. If our aim is absolute accuracy, all these expansions are 
essentially equivalent. We have to take in more and more terms in 
order to reduce the error below an arbitrarily small £. If, however, our 
aim is limited accuracy, these expansions are vastly different. We may 
expand directly in the powers of x, as we do in the case of the Taylor 
series. The convergence is then the slowest, i.e., the error, if we 
terminate the series after the nth term, is the largest. On the other 
end of the spectrum are the Chebyshev polynomials. If we expand 
into these polynomials, the convergence is the fastest, i.e., the error, 
if we terminate the series after the nth term, is the smallest. In fact we 
have obtained a numerical comparison of the two expansions and 
found that the error of the latter series is reduced in order of magni- 
tude by the factor 2” (in the range [0,1] the same factor becomes 
22-1), This makes it possible to obtain great accuracy with even a 
small number of terms. 

To obtain directly an expansion in Chebyshev polynomials by 
evaluating the coefficients c, on the basis of (7.6) will be only rarely 
possible. In all probability evaluation of the definite integrals (7.6) 
would become too cumbersome. But there are frequently other 
procedures available which require much less labor. We may start 
for example, with the Taylor series, which is frequently at our dis- 
posal, and “‘telescope”’ that series into a much shorter series, without 
losing essentially in accuracy. In order to see the operation of the 
method we will assume that a certain finite polynomial of the order n 
is given. We ask the question whether this polynomial is replaceable 
by a polynomial of lower order without losing too much in accuracy. 


458 Power Expansions Chap. VII 


Suppose the following function is given in the interval [0,1]: 
y = 1 — x 4 r? — r? + 24-25 4+ 2 (7-9.1) 


The graph of this function is quite 
y=f(x) smooth. Yet it is associated with a 
polynomial of 6th order. The last 
term is certainly not negligible with- 
out committing a largeerror. Henceit 
seems that we need all these powers 
for the representation of y. Yet this 
o i x isnot the case. By a proper technique 
our long expansion can be simplified. 
Let us rewrite the defining equation for Tg (x) in the following form. 
6144x" — 6912x4 + 3584x — 840x? + 72e—1 TR (a) 
048 20048 
(7-9.2) 
This equation shows a remarkable fact. The power 2° is algebraically 
independent of the lower powers and is certainly not expressible as 
a linear combination of lower powers. But this independence, if the 
range [0,1] is considered, is astonishingly weak. In this range xê is 
almost equal to a definite linear combination of the lower powers. If 
we write 
6144x” — 6912x4 + 358423 — 840x? + 72x — 1 
E 2048 


7 


xê (7-9.3) 


which means 
xê = 3x° — 3.375x4 + 1.75x3 — 0.410156x? 
+ 0.035156x — 0.000488 (7-9.4) 


we commit an error which is nowhere larger than 1/2048 = 0.000488, 
in view of the fact that Tẹ (x) oscillates between +1 throughout the 
range. Moreover, this accuracy cannot be surpassed, since the 
coefficient 2048 in Tẹ (x) is the largest possible coefficient for any 
polynomial which in the given interval oscillates between +1. 

We see that if we do not neglect xê but reduce it to the lower powers 
according to the equation (4), we commit an error which is very small 
and may well be below the accuracy we demand. Substituting (4) 
into the given y = f(x) we obtain the approximation 


y = 0.999512 — 0.9648442 + 0.5898442? + 0.75x° 
— 2.37524 + 22°  (+0.0005) (7-9.5) 


§9 Telescoping Power Series by Successive Reductions 459 


The new y is a polynomial of only fifth order. 

We can obviously continue this process. The table of the 
Chebyshev polynomials Tř (x) shows that x° can be expressed by the 
lower powers as follows. 


1280x4 — 1120z? + 400x? — 50x + 1 ( | 
5 eR ci rN ca aE ee — 7-9. 
j 512 = 59) (7-6) 
= 2.54 — 2.18752? + 0.781252? — 0.09765x 
+ 0.001953 (0.00195) 


If we substitute this expression into our previous g, we obtain a new 
approximation which uses not more than four powers: 


ğ = 1.003418 — 1.160156” + 2.152344a2 — 3.62523 (7-9.7) 
4+ 2.62524 (+0.0044) 


The error has increased somewhat but it may still lie substantially 
below the accuracy we expect. Hence we can try our luck once more. 
According to the table xt is expressible by the lower powers with an 
accuracy slightly less than 0.01. 


ie 256x3 — 160x? + 32x — 1 (+ =) 
7 128 128 

(7-9.8) 
= 2x3 — 1.252? + 0.25% — 0.007812 (+0.00781) 


Substituting back in (9.7) we now obtain the approximation 
¥ = 0.982910 — 0.503906x — 1.128906x? + 1.62523 (-.0.0249) 
(7-9.9) 


The error is now much larger than before but still sufficiently small 
for most practical purposes. 
If we go still further by one step, we can substitute 


x 48x? — 18x + 1 
3 = —— 
32 
= 1.522 — 0.5625x + 0.03125 (40.03125) (7-9.10) 
and obtain 


§ = 1.0337 — 1.4180z + 1.308622 (40.075  (1-9.11) 


460 Power Expansions Chap. VII 


The original long polynomial of sixth order has been reduced to a 
new polynomial of but second order with an accuracy which may still 
suffice in many cases. 


10. Telescoping of a power series by rearrangement. If we analyze 
closely what we have done in the previous section, we come to a new 
formulation of our procedure. Suppose we neglect nothing but write 
the first step of our reduction process in the form 


y = y; + 0.0004887F 
Then the next step can be written as follows. 


Yz = Y, + 0.0039067;° 


The next step gave y, = yz + 0.02050877 

and the last step, Yz = Ya + 0.050781T3 

All these steps combined give 

= Y = Ya + 0.050781T3 + 0.020508T* + 0.0039067;° + 0.0004887;" 


Suppose that we continued this reduction process to the very end, 
arriving finally at a constant. We would have obtained [with 
Tò = 4, cf. (8.8)] 


y = 1.630859T5 — 0.054687T¥ + 0.163574T% + 0.050781T3 
+ 0.02050877 + 0.00390673 + 0.000488T5 


What we have here is no longer an ordinary power series, but an 
expansion in Chebyshev polynomials. To be sure, we have exactly the 
same function with which we started. Itis only the form in which that 
function appears that is different. This change of form, however, is 
exceedingly beneficial for parexic purposes. 

Instead of going from step to step, we can proceed in a more 
systematic manner as follows. Just as the T% (x) are expressible as 
linear combinations of the powers of x, we can reverse the process 
and express the powers of x as linear combinations of the T% (2). 
This can easily be done if we remember the original definition of the 
polynomials T% (x): 


0 
T* (x) = cos nů, x = cos? 5 (7-10.1) 


§10 Telescoping of a Power Series by Rearrangement 461 


Hence 


10/2 —16/2\ 2n 
Ee) (7-10.2) 


x” = span” = ( 
2 2 


= Z [eos nO + (3) cos (n — 1)0 


+ (z) cos (n — 2)0 + = + (7) a 


=< Ir: (x) + (7) Tž (2) + (7) Taal) bo + (7) rs 


We see that, apart from a constant factor, a very simple binomial 
weighting of the T; (x) gives the powers x”. 

We can thus add to the table of the Chebyshev polynomials 77; (x) 
a new table which solves the inverse problem ; it expresses the powers 


of x in terms of the T7 (x) (cf. Table VIII of the Appendix), for 
example: 


1 = 27% 
x = 4(2T) + TT) 
x? = (675 + 47 + T2) 


This conversion process is numerically not too cumbersome, 
particularly if we multiply the given series by a proper factor which 
cancels out all the denominators. The rest is straight multiplication 
of the given coefficients by integers and additions, which can be done 
on the machine cumulatively. In our problem for example we will 
weight the given coefficients, from the lowest to the highest, by the 
factors 


4096, 1024, 256, 64, 16, 4, 1 
2048 
This gives, leaving aside the denominator, 


4096, -—1024, 256, —64, 16, —4, 1 


462 Power Expansions Chap. VII 


This row is now multiplied by the successive columns of the following 
matrix, obtained directly from Table VIII: 


20 15 6 1 

70 56 28 8 1 

252 210 120 45 10 1 
924 792 495 220 66 12 1 


This gives 
2048y = 334075 — 112TŤ + 335T% + 10473 + 42774 873 + T% 
Hence, dividing by 2048, 
y = 1.630859T3 — 0.054687TX + 0.163574T3 + 0.050781T3 
+ 0.0205087T* + 0.0039067T%* + 0.00048875 


What we gain by this rearrangement of the original power expansion 
is the increased convergence of the series. The coefficients of the 
Original series had the values 


i, —1, 1, —1, 1, —1, 1 
The coefficients of the new series have the values 


1.6308, —0.0547, 0.1636, 0.0508, 0.0205, 0.0039, 0.0005 


Without omissions, the new series is merely a modified form of the 
original series and nothing is gained. But in fact the much more 
rapid convergence of the new series will allow us to drop terms. The 
error caused by neglecting terms is simply given by the coefficients 
of the neglected terms. We get an upper bound of the error at any 
point of the range by adding up the absolute values of all the neglected 
coefficients. In our example, all the last coefficients happen to be 
accidentally positive numbers. This, however, is quite immaterial, 
since the error committed by neglecting a certain term is always 
oscillatory in nature, and thus it is imperative that the sum of the 
absolute values of all the omitted coefficients are to be taken for a 
realistic estimation of the maximum error. 

In our problem, for example, we obtain the successive error 
estimates, 


0.0005, 0.0044, 0.0249, 0.0757 


§ 11 Power Expansions beyond the Taylor Range 463 


Depending on the accuracy desired, we can tell at once how many 
terms we can drop without committing an error which is above the 
permissible limit. 

Whether we shall prefer the successive reduction process of the 
previous section or the systematic rearrangement process of the 
present section will depend on the numerical nature of the problem. 
Both processes lead to. the same result, namely the telescoping of a 
given power expansion due to increased (and maximized) convergence. 


11. Power expansions beyond the Taylor range. The methods of 
the two previous sections are applicable only if we start with a given 
power expansion. We have found a procedure by which this expan- 
sion is replaceable by a shorter expansion if a certain error can be 
tolerated. Hence we could start, for example, with a given Taylor 
expansion and reduce it to a more economical expansion of fewer 
terms. But this procedure may become very cumbersome if we come 
near to the convergence radius of the Taylor series, since then we 
would need a very large number of terms at the start, and the 
numerical work might become prohibitive. 

Even worse is the situation if we want a power expansion in an 
interval which goes beyond the convergence of the Taylor series. For 
example, the function 


o l 
l1+z 


possesses a convergent Taylor series only up to x = 1, while we may 
want to approximate this function in the range [0,4]. Here the 
rearrangement procedure loses its significance since we have no 
guarantee that operating with a divergent series will give convergent 
results. 

It is thus necessary to look for methods which will give practically 
well-convergent expansions even if we cannot rely on any primary 
series which may be rearranged into a more effective form. The 
existence of such series is guaranteed from the theory of orthogonal 
expansions, which gives the proof that any quadratically integrable 
function of bounded variation may be expanded into a complete 
orthogonal function system, such as the Legendre polynomials or the 
Chebyshev polynomials. Hence the possibility of expanding our 
given y = f(x) beyond the convergence interval of the Taylor series 


y (7-11.1) 


464 Power Expansions Chap. VII 


cannot be doubted ; the difficulty is only that the coefficients (7.6) of 
such an expansion are not at our disposal, since we do not know how 
to evaluate conveniently the definite integrals by which these 
coefficients are defined. We have to search for other more con- 
venient methods which may not give us the coefficients c, but some 
other coefficients of practically equivalent effectiveness. 


12. The t method. We are in the fortunate position that at least 
for a limited, but practically very frequent, class of functions a 
satisfactory solution of the problem can be found. Let us consider a 
function which is defined by a linear differential equation (or even 
purely algebraic equation), whose coefficients are rational functions of 
x. Bessel’s differential equation 

f 2 
y ths ( -E y=0 (7-12.1) 

x£ x 
and many other differential equations of mathematical physics are 
examples for this class of functions. In fact, the majority of the 
important functions encountered in the advanced chapters of physics 
and engineering belong to this category, if we add those functions 
which are not directly defined by such a law but which can be 


conceived as the solution of such a differential equation. For example, 
the function 


y= V1—«x (7-12.2) 
does not belong directly to this category but in actual fact appears 


immediately in the proper form if by logarithmic differentiation we 
find 


y’ l 
-= — 7-12.3 
TEET A 
and write this equation in the form 
3(1 — ry + y =0 (7-12.4) 
with the boundary condition 
y(0) = 1 (7-12.5) 


which makes the solution unique. Also Bessel’s differential equation 
can be written without any denominator if we multiply through 
by 2”. 

xy” + xy’ + (x? — p*)y = 0 (7-12.6) 


§ 12 The rt Method 465 


Generally we will assume that our equation is already written in this 
form. For example, the algebraic definition of the function (11.1) 
may be replaced by the equation 


(1 +2y—1=0 (7-12.7) 


All these equations have something in common. We can formally 
substitute a power expansion, 


y = by + byt + box? + (7-12.8) 


in these equations and obtain recurrence relations for the coefficients. 
If we know the proper initial conditions, these coefficients become 
uniquely determined and are easily calculable. The expansion (8) is 
almost always an infinite expansion. This expansion may or may not 
converge. If it converges, it may converge in an infinite range or ina 
limited range. But it is also possible—as it happens in the case of the 
so-called “asymptotic expansions” —that the formal series (8) exists 
but diverges for every point outside x = 0. 

We will now introduce a new viewpoint in the study of these 
expansions by truncating the series (8) to a finite number of terms. 
This means that we try to satisfy our equation by an expansion of the 
form 


y = by + biz + baz? + + b,a” (7-12.9) 


where n may be given. We will undoubtedly not succeed since the 
recurrence relation which demands a definite nonvanishing D,,,,, 
cannot be satisfied. We may say that we get an “overdetermined” 
system of equations since we have n + 1 coefficients at our disposal, 
but the number of equations we have to satisfy is larger than n + 1. 
In the case of a homogeneous equation we know in advance that a 
common factor of the coefficients must remain undetermined. Hence 
one of the coefficients, e.g., a, can be normalized to 1. The number 
of disposable coefficients is then not higher than n, but the number of 
equations to be satisfied will be larger than n. 

The degree of overdetermination can easily be established in 
advance by simple enumeration. For example, equation (4) does not 
increase the order of the nth power, and thus the number of recurrence 
equations will be n + 1. This surpasses the number of permissible 
equations by only one. In the case of Bessel’s equation (6) the last 
term raises the nth power to n + 2, and thus the number of coefficient 


466 Power Expansions Chap. VII 


equations will become n + 3. This surpasses the number of permis- 
sible equations by three. In the case of the equation (7) the mth power 
is raised to n + 1, hence the number of coefficient equations becomes 
n +- 2. Here the equation is inhomogeneous and n + 1 coefficients 
are at our disposal. Hence the degree of overdetermination is only 
one. 

In all these cases we see that the degree of overdetermination is not 
serious since it is a small constant number, while the number of 
coefficients to be determined can increase arbitrarily. 

We will now remove this overdetermination by putting something 
on the right side of the differential equation. This means that we are 
reconciled to the fact that we cannot solve the given equation 
exactly. The right side can be considered an error term which we 
introduce intentionally in order to make our equation solvable by a 
finite power series. 

Our first thought will be to introduce the Taylor expansion (8) in 
our equation, only truncated to n + 1 terms. Since the coefficients 
were determined by successive recurrences, the only equations which 
are not fulfilled are the Jast ones. Hence the error term we have to put 
on the right side will be of the form ræ” in the case of (4), or rz” 
in the case of (7), where 7 is an a priori undetermined constant. 

In the case of Bessel’s differential it would be necessary to put 
three terms of the form 


me” at aaa 


on the right side of the differential equation (6). In actual fact, the 
proper study of this differential equation shows that for large x we 
obtain as good approximation 


et 
Y= Va 
If now we take out this asymptotic solution as a factor and write 


1x 


y = = ula) (7-12.10) 


we have defined a new function u(x) which is much smoother than 


§ 12 The + Method 467 


the original function y(x) has been. For this u(x) the determining 
differential equation becomes 


xu” + 2ix*u’ — (p? — )u=0 (7-12.11) 


This equation requires only two 7 terms on the right side, since the 
power x”+2 disappeared. But a further simplification takes place if 
the new variable 


Gal (7-12.12) 
x 


is introduced and we expand in powers of €, i.e., in reciprocal powers 
of x. The new differential equation in the variable € becomes 


Ey” + R(E — iju — (p? — du = 0 (7-12.13) 


This equation does not go beyond the nth power, and only the single 
term 7x” is needed to remove the overdetermination. 

If we are interested in small values of x, a similar simplification can 
be implanted on Bessel’s differential equation but by somewhat 
different tools. A study of the Taylor expansion reveals that x? can 
be taken out as a universal factor in front of the expansion. More- 
over, the remaining series proceeds in even powers of x only. It will 
thus be reasonable to introduce a new variable 


p= E (7-12.14) 

and to take out an amplitude factor. We thus put 
YVI = Pult) (7-12.15) 
This substitution gives for u(t) the following differential equation. 
tu” + (1 +pw +u=0 (7-12.16) 


Again we notice that the differential equation in the new form does 
not overstep the power x”. Hence one term of the form rz” will 
remove the overdetermination. 

This example shows that by the proper transformations we may 
greatly simplify our problem and reduce the degree of overdetermina- 
tion. In actual practice a single t term covers much ground, and more 
than two 7 terms are almost never required. 


468 Power Expansions Chap. VII 


A further examination of our problem reveals that we need not 
choose our 7 term in the form rz”. We can remove the overdeter- 
mination equally well by a right side of the form 


p(x) = 7p,(2) (7-12.17) 


where p,,(x) may be an arbitrary polynomial. This provides us with a 
powerful opportunity to improve on the accuracy of the Taylor 
expansion. The error term 


pe) = 72" (7-12.18) 


which would lead to the truncated Taylor expansion, has the property 
that it remains exceedingly small in the neighborhood of x = 0 but 
then increases very strongly in the neighborhood of x = 1. Hence 
we have satisfied the given differential equation with excessive 
accuracy in the neighborhood of the origin x = 0 but the price we 
have to pay is that the error increases practically exponentially as we 
come near to the point x = 1. 

Let us agree that we have introduced already a scale factor which 
normalizes our range of interest to [0,1]. We will undoubtedly fare 
much better if we replace the error term (18) by the term 


p(x) = TTŽ (x) (7-12.19) 


because the Chebyshev polynomial 7;*(x) oscillates in the range 
[0,1] with equal amplitudes. Hence we have adjusted our error 
throughout our interval to a balanced quantity which is neither too 
small nor too large at any point of the range. At no neighborhood 
will we get excessive accuracy. But at no neighborhood will we get 
a very large error which would make our series diverge. The 
accuracy we obtain is equivalent to the accuracy obtained by the 
‘ideal’? Fourier coefficients (7.6), but the new coefficients are 
evaluated by simple algebraic recursions, without any integrations. 
The coefficients of the expansion are of course no longer “rigid” 
coefficients like the coefficients of the Taylor series which form a 
single row of numbers. If we change n, the order of the approxi- 
mating polynomial, we also change completely the coefficients of 
T(x) since 7;*(x) is now replaced by 77,,(z). But this means that in 
the successive recursions we get an entirely new set of coefficients for 
every n (cf. § 5). We adjust our coefficients to the range in which we 


§ 13 The Canonical Polynomials 469 


want our function and to the number of powers with which we want 
to operate. The result of this flexibility is threefold : 

1. We can increase the convergence of the Taylor expansion, 
that is, we can make the error of our approximation much smaller 
than the error of the Taylor expansion with the same number of 
terms. 

2. We can obtain a convergent expansion in cases when the 
Taylor expansion diverges in the given range, either from the 
beginning or from a certain point on. All the so-called “asymptotic” 
expansions of the customary transcendentals of mathematical 
physics can thus be transformed from divergent to convergent 
expansions. 

3. We can obtain a convergent expansion in cases when the 
Taylor series does not exist at all. 


13. The canonical polynomials. If we operate with “‘flexible”’ 
coefficients, the following objection can be raised. Let us decide 
that we are not satisfied with a certain order of approximation. 
Then, if we go from n to n + 1, we have to throw away the previous 
results completely and start the entire calculation again. We will now 
develop a method which obviates this objection. 

Instead of putting an error term of the form 77;7(z) on the right 
side, we will put the single power x” on the right side and solve the 
resultant coefficient equations, assuming a finite expansion. But then, 
enumerating the number of equations to be solved, we find that the 
number of conditions surpasses the number of available parameters 
by u. We relieve this overdetermination by dropping the first u 
equations for the time being. We consider thèm as a kind of “initial 
conditions” which will be taken care of later. 

To illustrate the situation, let us examine the differential equation 


xy’ — 2y = 0 (7-13.1) 


We would encounter this equation if our aim were to obtain a power 


expansion for the function 
y = ee (7-13.2) 


The recurrence relations for the coefficients demand n + 3 equations 
for n + 1 coefficients. The number of surplus equations is thus two. 
Hence we will drop two equations in the beginning in order to make 
our system consistent. If we substitute the formal expansion 


470 Power Expansions Chap. VII 


(12.8) in our equation and tabulate the resulting coefficients, we 
obtain in succession 


—2h, = 0 

—2b, = 0 

—2b, =0 

bi S 2b, == 0 
2b, — 2b, = 0 (7-13.3) 

3b, — 2b5 = 0 


We relieve the overdetermination by omitting the two boxed 
equations. 

Now the error term x” on the right side of the original equation 
means the appearance of a single “one” on the right side of the 
equation (3). This 1 appears in the mth equation. By assigning to 
m the successive values 0, 1, 2, --- , we let this solitary 1 glide down 
from the top to the bottom. Since, however, in our problem the 
second and third equations have been obliterated, we avoid the 
indices m = 1 and m = 2. 

What we obtain in this manner is a definite set of polynomials, 
inherently associated with the given differential operator. We will 
call them “canonical polynomials” and denote them by Q,,(%). 
This Q,,(z) is not necessarily of the order m. The index m refers 
merely to the fact that if the operation indicated by the left side of the 
given differential equation is performed on the polynomial Q,,(z) 
(omitting the surplus equations), the result is x”. In our problem for 
example we will refrain from defining Q,(x) and Q(x), but the 
remaining Q,(z) are uniquely determined 


Q(x) = —t 

Q(x) = x 

Q,(z) = zr? 

Q.(x) = 3(2x + 23) (7-13.4) 


Q(z) = ta? + 11) 
On(2) = eláe + 62% + 32) 


§ 13 The Canonical Polynomials 471 


Another interesting example is the differential equation 
xy —y—x=0 (7-13.5) 
which defines the function 
| y = x log x (7-13.6) 


apart from the additive term cx. Here the mere enumeration of the 
equations shows no overdetermination, since we get n + 1 equations 
for n + 1 coefficients, and the system is inhomogeneous. But the 
second coefficient equation expresses the contradictory statement 
—1 = 0. We consider this equation as overdetermination and drop 
it from our system. 

In this example the canonical polynomials become 


Q(z) = —1 
Q(x) = x? 
x? 
2) = 5 (7-13.7) 
Oml) = ——— 


In all cases, step-by-step construction of the canonical polynomials 
can occur without difficulty, even if we cannot always write down 
their general expression in such simple manner as in the last 
example. 

These canonical polynomials put us in the position to obtain the 
solution of the differential equation Dy = 0, if an error term of the 
form (12.19) is put on the right side. We make use of the super- 
position principle of linear operators. Since 77 (x) can be written out 
in its actual coefficients, . 


T* (x) = ae (7-13.8) 
m=0 


the solution of 


Dy = rT*(2) (7-13.9) 


472 Power Expansions Chap. VII 


becomes 
n 


y=T > AOne) (7-13.10) 


m=0 


The solution appears in explicit form as a linear superposition of 
a rigid set of polynomials, viz., the canonical polynomials which are 
uniquely associated with the given differential operator. The 
coefficients of the Chebyshev polynomials appear as mere weight 
factors of these polynomials. Step-by-step solution of the coefficient 
equations separately for each n is now avoided. The solution (10) 
may be put in the following operational form 


y = 7T3(Q(2)) (7-13.11) 


with the understanding that in the formal expansion of T*(z) the 
power Q” is replaced by Q,,(2). 

The freedom of 7 can now be used for satisfying the surplus 
equation which was originally omitted. If we had to omit more than 
one equation, we need more than one 7 term on the right side. In 
the case of two surplus equations, for example, we will need the 
error terms 


p(x) = Tala) T TaT a412) 


which gives rise to two r factors, to be determined from the two 
remaining conditions. In actual practice it will be more convenient, 
and practically equally effective, to operate with only one set of 
Chebyshev coefficients. We can do that by using an error term of 
the following form 


p(x) = TaT + 72%) (7-13.12) 


The explicit solution (13.10) now becomes 


n 


Yn = > AOne) + T20msa(@)] (7-13.13) 


m=0 


The remaining two equations give two simultaneous linear equations 
for 7, and 7, which can be solved without difficulty. 


§ 13 The Canonical Polynomials 473 


We carry through the program for the case of the differential 
equation (5). Here the solution becomes 


n 


——r4r Yt + cx (7-13.14) 
Yn t—- n m — 1 . 
m=2 
where c is arbitrary. The equation we have omitted becomes 
—1=r7c (7-13.15) 
from which 
1 (—1)" 
ge 5 = = (7-13.16) 


The constant c can be determined from the boundary condition 
y(1) = 0 (7-13.17) 


This solution gives ever-improving approximations of the function 
y = xlogx in the form of a power expansion with flexible 
coefficients. No Taylor series exists in this case, in view of the 
analytical singularity at the point x = 0. But the expansion (14) 
converges to y = x log x uniformly throughout the domain [0,1]. 
Moreover, the convergence is satisfactory even for small n, since the 
maximum error is bounded by |7|, i.e., by 1/2n?. For example, the 
choice n = 4 gives the expansion 

1 1 ( 256 128 s) 221 


= —— + — (16022 23 — zx (7-13.18 
Y, 35 + p(l- 2? + sg” (7-13.18) 


= sg (-3 — 221x + 4802? — 384r? + 128x$) 


with a maximum error of +0.03. 

Use of the canonical polynomials Q,,(%) is advocated from still 
another viewpoint. It makes it possible to adjust the solution to 
arbitrary ranges. Let us assume that we want a solution in the range 
[0,x], instead of [0,1]. This « may be any given real or complex 
number, provided that the ray [0,«] does not include any singular 
point of the function y(x). Now this readjustment of the range is 
immediately accomplished by merely replacing T7(x) by T% ( 7) ; 


x 


474 Power Expansions Chap. VII 


This means that the coefficients c} are to be replaced by cha”. 
Hence the solution (10) will now appear in the form 


n 


ya) =r > ca" O,(2) (7-13.19) 


m=0 


where x is any point along the complex ray [0,«]. If we go to the 
end point of the range x = a, we obtain a solution of the given 
differential equation at an arbitrary point z of the complex plane, 
compatible with the general regularity conditions. The solution 
appears in the form 


y(z) = 7 5 c™z-™O (2) = TTX ES (7-13.20) 


m=0 a 


and we have overcome the restriction that x has to lie between 0 
and 1. However, this solution will generally not be a polynomial any 
more but the ratio of two polynomials (cf. 14, Example 5). 


14. Examples for the t method. The 7 method has such wide 
fields of applications that it will be of value to discuss a few character- 
istic examples which demonstrate the power of the method. The 
problem of obtaining suitable error bounds will be left to the next 
section. 

The first example deals with a function whose Taylor expansion is 
well convergent. We show how the application of the 7 method 
greatly increases the degree of convergence. In the second example 
we show how the realm of convergence of the Taylor series can be 
extended to a wider range. The third example demonstrates how a 
completely divergent “asymptotic” expansion can be changed into a 
convergent expansion. The fourth example deals with a function 
whose Taylor expansion does not exist. The fifth example illustrates 
the extension of the method to the complex domain. 

Example 1. We consider the function 


defined by the differential equation 
y—y =0 (7-14.2) 


§ 14 Examples for the + Method 475 
with the boundary condition 
y(0) = 1 (7-14.3) 


We have n + 2 equations for n + 1 coefficients. The equations 
become compatible by merely dropping the boundary condition. 
The canonical polynomials Q,,(x) of our problem become 


Q) = 1 
O(z)=1+-2 
ya 
Q(x) = 2 (1 + a+ z) (7-14.4) 


x? x? 
Q(x) = 31(1 +2+545) 


g? am 
EE TOEN 
= m! S,,(2) 
x? a™ 
where S,(#)=1lt+et+—+-4— 
2! m! 


represents the successive “‘partial sums” of the Taylor series. Hence 
the solution of our problem becomes, according to (13.10), 


n 


Yna) =7 > ctm!S,,(z) (7-14.5) 


m=0 
We now satisfy the condition (3) and obtain 
n 
> c™m!S,,(x) 
Y, = (7-14.6) 


n 


> cam! 


m=0 


The solution appears as a weighted arithmetic mean of the partial sums 


476 Power Expansions Chap. VII 


of the Taylor series. The successive convergents of e” in the range 
[0,1] thus become 


Yo = 1 
i + 2x 
Y = 1 
__ 9+ 8x + 82? 
2 9 (7-14.7) 


113 + 1142 + 4822 + 3223 
43 = 113 


1825 + 18242 + 9282? + 25623 + 12824 
“= 1825 


The corresponding e values, putting x = 1, become 
1, 3, 2.777, 2.7168, 2.71836, 2.71828068, =- (7-14.8) 


(A dot under a digit indicates the first decimal in error.) The 
convergence is strong in comparison with the much slower con- 
vergence of the Taylor series: 


l, 2, 2.5, 2.667, 2.70867, 2.71667, 


yet the results are no match to the spectacular e approximations 
found in VI, 20. The previous approximations are analogous to 
the present ones and can be obtained by the present procedure, 
provided that the coefficients cy of the Chebyshev polynomials are 
replaced by the corresponding coefficients of the Legendre poly- 
nomials P*(x). Since we have decided that the Chebyshev poly- 
nomials give a smaller error than the Legendre polynomials, it seems 
strange that the previous e values are so much superior to the new 
ones. However, we have to consider the entire range cf x between 
0 and 1. The smallness of the error at the end point x = 1 does not 
guarantee that the error will be small everywhere. Moreover, the 
choice of T¥(x) as error term of the differential equation, while 
securing good convergence, will not give necessarily the best possible 
convergence. What we want to achieve is an even oscillation of 


§ 14 Examples for the + Method 477 


the error in the function y = f(x). The operation Dy may con- 
siderably alter this uniform character of the error. By proper study 
of the given differential operator we may construct an error term 
p(x) which is better than the choice TTŽ (æ). 

For example, in our problem we can see without difficulty that 
the dominant term in the given differential equation is —y’, since 
the differentiation of a high-frequency oscillation of the type 
cos nO magnifies the amplitude by the factor n. Hence in this 
problem it would be preferable to put 


p(w) = 1T, 1(2) (7-14.9) 


because, assuming that the polynomial approximation y,(z) is of 
the form 


y A(X) = E — TT, (2) (7-14.10) 


the operation y — y’ will generate (in good approximation) the 
error term (9). The solution (5) will now hold again, but the c% 
have to be replaced by the coefficients of T„% ;(a), called “Chebyshev 
polynomials of the second kind.” Furthermore, the estimated 
solution (10) indicates that we should not satisfy the boundary 
condition y(0) = 1 exactly. We should rather determine 7 from the 
condition 


y (0) = 1 — Tp (0) = 1 — 7(—1)"+!} (7-14.11) 
This gives 


T 


(—1)"+ + > Emm! —1 (7-14.12) 
m=0 


and thus the final solution becomes 


n 


> eam! S,,(2) 
y (2) = — = (7-14.13) 
(—1)"4 + > arm! 


=0 


This is again a weighted arithmetic mean of the partial sums of the 


478 Power Expansions Chap. VII 


Taylor series, but the weight factors have changed. The first five 
convergents of e” in the range [0,1] now become 


E- 
Yo = 1 

_ 8+ 16x 
Y = 9 

114 + 96x + 9622 (7-14.14) 
2 = 113 

1824 + 1856x + 768x? + 51228 
I = 1825 

36690 + 36640x + 1872022 + 5120x? + 256024 

44 = 36689 


The estimated maximum error at any point of the range does not 
exceed the reciprocal of the denominator. Hence the quadratic 
approximation is bounded by +0.01, and the cubic approximation 
is better than +0.0006. The first 6 e values become 


2, 2.667, 2.70796, 2.717808, 2.718253, 2.71828075 


The “best” nature of this approximation is not invalidated by the 
superiority of the remarkable approximations (6-20.4) and (6-20.5) 
obtained on the basis of the quadrature method described in V, 22. 
These are end point approximations, and one can show that in our 
problem the Legendre polynomials are superior to the Chebyshev 
polynomials from the standpoint of minimizing the error of y(1) at 
the end point of the range. 
Example 2. We consider the function 


= 7-14.15 
gE ( ) 

defined by the algebraic equation 
(a+ xjy—1=0 (7-14.16) 


Instead of using the canonical polynomials, we will deal with this 
problem in.a different manner, in order to illustrate a method which 
in some problems is more adequate. Instead of expanding y(x) into 


§ 14 Examples for the + Method 479 


powers, we will immediately arrange our expansion in Chebyshev 
polynomials 


Y = beg 9 (2) + TIE) + + CyaTea(@) (7-14.17) 


As usual, we put a 7 term on the right side, to insure compatibility 


(a + x)yy=1+4 tT} (œ) (7-14.18) 
In this problem we can obtain r in advance. If we substitute 
x = —a, we get zero on the left side. Hence 
1 
= — = 7-14.1 
= RO a 


We now substitute the expansion (17) on the left side and perform 
the multiplication by (a + x). For this purpose we make use of the 
recurrence relation (7.7), which for the shifted polynomials T(z) 
becomes 


(2x — IT @) = Tria) + Tk) (7-14.20) 


In order to use this relation we multiply our equation (18) by 4 


and put 
4a + 2 = 2b (7-14.21) 


[2b + (4x — 2)Jy = 4 + 4rT¥ (2) (7-14.22) 


We tabulate the coefficients of the expansion, obtaining in succession: 


Left side: bco 2bcey 2bcyg 2bcs 
Co Cy Co 
C C2 C3 C4 
Right side: 4 0 0 0 


This gives the equations 
bco + 41 = 4 
Co + 2bce, + co = 0 
cı + 2bc, + c3 = 0 (7-14.23) 


480 Power Expansions Chap. VII 


Let us disregard for the time being the first equation. The remain- 
ing homogeneous equations are solvable by an assumption of the 
following form 


C = Cp* (7-14.24) 
All the equations are then reduced to one quadratic equation for p 
1 + 2bp + p?=0 (7-14.25) 

which gives the two roots 
p=—bive—1 (7-14.26) 


= —(2a+1)4+2 Vala + 1) 
In view of the superposition principle of the solution of linear 
system, we thus get 
Ch = Cip? + Cp} (7-14.27) 
where p, and p, are the two roots, corresponding to the + sign in 


(26). The two constants C} and C, are so far arbitrary. But the 
condition c, = 0 gives 


Gipi + Cops = 0 (7-14.28) 
while the first equation of the system (23) demands 
Cipi + Cope + WC, + C) = 4 (7-14.29) 


The quadratic equation (25) shows that the product of the two 
roots p, and p, give 1. We will call p, the absolutely smaller and p, 
the absolutely larger root. Now the relation (28) gives 


Cae (2) (7-14.30) 
P2 

If n grows very large, C, converges to zero. We simplify matters 
without making the error essentially larger if we drop C, altogether. 
This means that we establish immediately the infinite expansion into 
the T*(x) i.e., the Fourier series if we think in terms of the variable 0 
[cf. (8.3)] and then truncate this series to n terms. Our solution is 
then simply 
y= 7g = TE A (7-14.31) 
* btp  Vaa +1) 
with 

p= 2V a(a + 1) — (2a + 1) (7-14.32) 
assuming that a is positive. 


§ 14 Examples for the + Method 481 


Generally a may be any real or complex constant. The solution 
of our problem is quite generally the infinite expansion 
1 2 
nee eT 
where p is the absolutely smaller of the two roots (26). If this series 


is terminated after n terms, the remainder of the series can be 
estimated on the basis of the fact that all T7 (2) oscillate between +1. 


BTE) + pT{(@) + p?T3(@)+ + (7-14.33) 


2 n 
inal << AE a++) 
2|p|"_ _1 
Tas A em 7-14.34 
Ital < a ne en 


Since p;p = 1, one of the roots must always remain smaller than 1 
in absolute value except in the limiting case when both p, and p, 
lie on the unit circle and p, is the complex conjugate of p,. This is 
possible only if a is a real negative number between 0 and —1. In 
that case y(x) has a singularity in the critical interval [0,1]. In all 
other cases the series (33) converges, and the convergence is even 
absolute. 

Compare this result with the convergence of the Taylor series 
which extends only up to x = | a | and becomes divergent beyond 
that point. In both cases the expansion has the character of a 
geometric series, and the estimated maximum error 7,, at any point 
of the range has the form (34). But the p of this formula has in 
the two cases widely different values. The following numerical 
table makes a comparison between the two p values for a wide 
range of the parameter a, assuming that a is a real positive 
number. 


| Taylor | T(x) 
a BE -P 
3 0.333 0.072 
2 0.5 0.101 ; 
1 1 0.172 (7-14.35) 
0.5 2 0.263 
0.333 3 0.333 
0.2 5 0.420 


482 Power Expansions Chap. VII 
Of particular interest is the case 
y= load aH H 
= V2 [4 — 0.17167" (x)+(0.1716)272 (x) (7-14.36) 
— (0.1716)? 73 (x) + °] 
The quotient of the second expansion is deduced from 
3 — 2 V2 = 0.1716 = 


Compared with the Taylor series we have gained the large factor 
5.83. The rough overall estimate! suggests the factor 4, and that 
value is actually correct for large values of a where the Taylor series 
is still well convergent, as we can demonstrate by the case a = 3. 
For decreasing a the gain increases slowly and reaches at the limit 
of convergence of the Taylor series the maximum value 

3 + 2 V2 = 5.8284 - 


Beyond a = 1 the Taylor series does not hold in the entire interval 
[0,1], while the Tš(x) expansion retains its convergence. 

Example 3. This example is chosen because it illustrates the 
manner in which the 7 method transforms a completely divergent 
expansion into a strictly convergent one. We consider the 
“exponential integral” 


© p—t 
w(2) = Í — dt (7-14.37) 
encountered in (3.1). As before, we write 
w(2) = — u(2) (7-14.38) 


and obtain for the new function u(z) the differential equation 
—zu + (1 + zu =z (7-14.39) 
We now introduce the reciprocal of z as a new variable 


z = : (7-14.40) 


1 See footnote to p. 453. 


§ 14 Examples for the Method 483 
and obtain the new differential equation, 
ay’ +1+xy=1 (7-14.41) 


If we put for y(x) a formal power expansion, we obtain the following 
infinite series. 


y(x) = 1 — x + 212? —3le34 4! x4 — (7-14.42) 


The variable x of this expansion is in actual fact the reciprocal of the 
original variable z. Hence we can conceive the expansion (42) as 
the formal antipode of the — 


23 
~ 31 


inasmuch as every term is the exact reciprocal of the latter expansion. 
While the series (43) is exceedingly well convergent—the con- 
vergence being preserved for any value of z—the series (42) is 
exceedingly well divergent ; the convergence is lost for even arbitrarily 
small values of x, except for the trivial value x = 0. Such a series 
is nevertheless of great value, since one can show that the error of 
the truncated series is smaller than the first neglected term. Hence 
we use only the decreasing part of the series, stopping at the proper 
term. We choose this term by the condition that the first neglected 
term shall be smaller than the term which follows it. By this method 
we obtain good approximation for sufficiently small values of x but 
with increasing x the series gradually loses its value. 

We will now treat this problem by the 7 method. We drop the 
first equation for the purpose of compatibility and establish the 
canonical polynomials pons starting with m = 1. We thus find 


e`? — 2 +5 | ++ (7-14.43) 


On(2) =" 5, (2) (m= 1,2.) (1-1444) 
and the r solution becomes 
Š Saa 
=r S aE (7.14.45) 
m=1 ° 


We now satisfy the temporarily omitted first equation. 


by =1+ 70° (7-14.46) 


484 Power Expansions Chap. VII 
This gives 
1 


n 


Şam! 


m=0 


(7-14.47) 


and the complete solution becomes 


> ED ei"(Sna(2)/m!) 
Yn (4) = 2 (7-14.48) 
(—1)™e7'(1/m!) 
=0 


m 
It is of interest to make a comparison between this solution and the 
solution (6) found in the case of the exponential function. In both 
cases the solution appears as a weighted arithmetic mean of the 
partial sums of the Taylor series. The essential difference is that the 
factor m! appeared previously in the numerator of each term, while 
now it appears in the denominator. Previously the partial sums of 
high order were strongly emphasized; now they are strongly de- 
emphasized. Use of the truncated Taylor expansion of the order n 
corresponds to the extreme weighting 0, 0, --- , 0, 1. In the case of the 
well-convergent e” function this weighting is not damaging and not 
too far from the weighting of the more effective r series, which is 
much less extreme than the weighting of the Taylor series but still 
not too dissimilar to it, by putting the center of importance out to 
the large m. In the second case the situation is quite different. Here 
the influence of the high-order terms is cut down by small weight 
factors and the center of importance is on the partial sums of low 
order. This corresponds to the customary method of going with the 
partial sums only up to a point and be satisfied with an error which 
we cannot further diminish. Here, however, we need not stop with 
any definite n. The higher-order partial sums are not thrown away, 
but their damaging influence is cut down by a proper factor. The 
higher-order sums serve as correction terms and in actual fact we 
come closer and closer to the correct functional value at any point of 
the range, as n increases more and more. Whether the original 
S,(z) converge in themselves or not, is irrelevant. The proper 
weighting establishes their convergence in every case. 


§ 14 Examples for the + Method 485 


The first five convergents of our problem become 


2 2 


ees 7-14.49 
Yo To 3 ( ) 

8+8 (1—x)_ 12-4 
aT T4842 = °0°|)~ OB 

— B+ 8-2) + (l — x+ 20?) 142 — 88x + 322? 
a 1+ 18+ + 2 ~ 145 

32+ 2$° (1—x)+ 28% (L—2+2x?)+- 428 (1—x + 22% 6r?) 

a a ee E 

_ 160 — 128x + 96x? — 3223 

7 161 

_ 504492 (1—2) + 8+ 3H) EC) 
m 1+ 50 RFH 

_ 7414—66642-+73282?—51840°+ 153604 

- 7429 


The estimated accuracy (cf. § 15) of the solution is such that the 
maximum error at any point of the range cannot surpass 7. Hence 


| 7 | < 0.33, 0.077, 0.021, 0.0062, 0.0021, 0.00070, --- 


At the end point x = 1 of the range we obtain the following con- 
vergents for y(1) = 0.596358414. 


0.667, 0.6154, 0.5931, 0.59627, 0.596312, 0.5963666, 
(7-14.50) 


The asymptotic series tells us at this point only that the functional 

value must lie between 0 and 1. The 7 solution, on the other hand, 

gives the functional value, as 7 increases, with arbitrary accuracy. 
Example 4. We will now obtain an expansion for the function 


y= Var (7-14.51) 


486 Power Examples Chap. VII 


Since all derivatives of the function become infinite at the point x = 0, 
the Taylor series does not exist in this case. 
By logarithmic differentiation we find 


ae (7-14.52) 


Hence we see that we can characterize our function by the differential 
equation 


ary’ —y=0 (7-14.53) 
but we change this equation to 
2xy' — y = TT} (2) (7-14.54) 
The canonical polynomials Q,,(x) of our problem become 
Q(x) = —1 
Q(x) = 2 
2 
Q(x) = = (7-14.55) 


- 
Qn) = n i 


and thus 
= m 
rp ss : 
y,(«) = > ar (7-14.56) 
m=0 
The factor r can be determined by the boundary condition 
y(1) = 1 (7-14.57) 
Hence 
“2m —1" 
Yy (x) = _— (7-14.58) 
Cn 
im 2m — 1 


The error analysis of this problem is particularly interesting. The 
critical point is the point x = 0 where the function has an analytical 
singularity. Hence the largest errors will occur in the neighborhood 
of the origin. We will operate with the angle variable 6 of § 8, which 


§ 14 Examples for the r Method 487 


is the adequate variable whenever the Chebyshev polynomials are 
involved. For the sake of convenience, we replace 6 by 7 — 0 and 
have accordingly 


0 

— oin 2 — 

s= sin? 5 
T*(x) = (—1)" cos nô 


For small x we can put 


02 
v = — (7-14.59) 
4 
and rewrite the differential equation (54) in terms of 6. 
by’ — y = 7 cos n0 (7-14.60) 


where 7 = (—1)”r. We solve this differential equation by the method 
of the “variation of the constants.” We put 


y = C0 (7-14.61) 
, _CosnO 
C = 7 g2 (7-14.62) 
Integrating by parts, 
Ca = cos nO — n7 ll nt d0 (7-14.63) 
Hence 
6 as 8 
Yn = —T cos nb — n70 [f ~~ d0 + A | (7-14.64) 
0 


where A is a constant of integration. Since, however, y(0) must be a 
polynomial in 6?, a term of the form 40 cannot occur. This shows 
that A must be put equal to zero, and we obtain 


y,(9) = —7 cos n0 — FnOSi(n6) (7-14.65) 


The function Si(n9) oscillates around the asymptotic value 7/2 with 
decreasing amplitudes. This shows that y(0) oscillates around the 
function 


—7n 5 6 = —tnaV x 


488 Power Expansions Chap. VII 


Since the function we want to obtain is V T, we have to choose 7 in 
such a way that the factor of Væ shall become 1. This yields 


eD 


nT 


(7-14.66) 


The error of the approximation y,(6) thus becomes 


nn(8) = Ve — y,(8) = = (cos nO + n0 | sien — z] ) 
(7-14.67) 


In order to obtain the points of maxima of the oscillatory error 
n,(9), we put the derivative equal to zero. This gives the condition 


Si(nd) = — 
But at these points 
1 
N, (0) = — — cos n0 
nt 


1 
and thus we see that Nna = — 
nT 


is an absolute error-bound for the entire interval. With increasing n 
this 7,, decreases slowly to zero. But even n = 6 gives an accuracy of 
+0.053; that is, a certain polynomial of 6th order approximates 
vz with a maximum error of +0.053 in the entire range between 0 
and 1. By substituting in (56) this polynomial becomes 


yale) = = [1 + T2242 n? + Bhp mesae gt Oe 9529400 
TT 


= 0.053 + 3.820x — 14.854x? + 38.028x? — 52.385x4 
+ 36.217x5 — 9.8772° 


Before we came to the discussion of the error, we had already 
a method for obtaining r; we made use of the condition (57). Since 
the error at the point x = 1 is small, we can expect that the two 7 
determinations will differ but slightly. This gives the following 


§ 14 Examples for the + Method 489 


approximate relation which holds even for small n with remarkable 
accuracy, and becomes exact for n = œ. 


n cl 
n — n+] 
>, FON (—1)"**a1 (7-14.68) 
m = 


For example in the case n = 6 the number m thus determined 
becomes 3.1427, which means an accuracy of 1 : 1000. We will 
return to this problem once more in the next section. 

Example 5. The following example is chosen to illustrate how 
the r method can serve for the precision approximation of functions 
even in a complex range. The function 


y = arc tan z (7-14.69) 
can be defined by the differential equation 


E 
We put 
y = 2u(z) (7-14.70) 


This u(z) is now an even function of z and can thus be conceived as a 
function of z?. Hence we introduce the new variable 


g2—é (7-14.71) 


and obtain for u() the differential equation 


Sa eee 
2éu + u = IE (7-14.72) 
The equation we want to solve is thus 
(1 + &)(2u’ + u) = 1 + TT (6) (7-14.73) 


The infinite Taylor expansion, obtained by substituting for u(&) a 
polynomial of infinite order (without any 7 term) becomes 


oe ee eee ae ee 
eta as rea 


+e (7-14.74) 


We now derive the Q,,(£) polynomials, omitting the first equation, 


490 Power Expansions Chap. VII 


in order to relieve overdetermination. Hence the Q,,(&) start with 
m = 1, and we obtain 


0,(§) = (—)™IS,,_1(6) (7-14.75) 


Consequently the solution of (73), apart from the equation of zeroth 
order, becomes 


uE) = 7 > (=De ml) (1-14.76) 
m=1 


The omitted equation was 
u(0) = 1 + re? (7-14.77) 


which yields for 7 the condition 


1 l 
a T 
Ta (1) 7-14.7 
> (—1)"-1¢" n ( l 8) 
m=0 
and thus 
n 
i ad CS m-1() (7-14.79) 
Un—a( ) == T*(—1) 
The estimated maximum error is 
| 1 
nai =l lT] = n 7-14.80) 


If we substitute for ¢ the value 1, the Taylor series gives the celebrated 
Leibniz-series for 7/4. 


oe a oo 


which is very slowly convergent. Weighting the partial sums by the 
Chebyshev coefficients according to (79) makes the series rapidly 
convergent. The first five convergents of 7/4 become 


Taylor: 
1, § = 0.667, +3 = 0.867, os = 0.724, $33 = 0.835 


§ 14 Examples for the + Method 491 


t method: 


2-1 2 
i 0 667 
Te 
8+8- 40 

— — = 0.7843 
(oes st 


18+ 48-%+ 32-32 1166 

1+18+48+32 1485 
i . 13 .-16. 47584 

32 + 160 - § + 256-73 + 128: xos __ 4758 oi 
1+ 32 + 160 + 256 + 128 60585 

50 + 400 - 4 + 1120-32 + 1280- 76 + 512 - 298 

1 + 50 + 400 + 1120 + 1280 + 512 

2496018 

~ 3178035 


= 0.78518 


= 0.7853966 (7-14.81) 


(z = 0.785398163) 
However, our aim is to extend our solution to arbitrary complex 
values of the variable £. For this purpose we make use of the “‘para- 
metric method,” discussed at the end of § 13. According to the 
formula (13.20) the only difference is that the coefficient c% has to be 
replaced by cp -™. Hence the solution (76) is now transformed into 


u(é) = 7 > (—1)™-1e™E-™S__(E) (7-14.82) 
m=1 


The omitted equation remains unchanged: u(0) = 1 + rc. This 
determines 7 in the form 


1 =d 
CS re a Tar 
Şon gng- T,(—1/6) (7-14.83) 


m=0 


and the final solution becomes 


l L Co 
u, = Pop LO zm Smal) (7-14.84) 


492 Power Expansions Chap. VII 


Going back to our original variable z, we finally obtain the following 
sequence of approximations of the function y = arc tan zx. 


Y n- = T*(—1/z3) oF 2) 4 Xe Nn — Sm-1(2?) (7-14.85) 


This is no longer a simple polynomial approximation, but the ratio 
of two polynomials. For example, putting n = 4 we obtain the follow- 
ing approximation. 

32z 420 + 700z? + 32924 + 3826 


araa e cc cf Acca 14, 
arc tan 2 = 705 128 + 25622 + 16021 4 3220 L (1480) 


We demonstrate this approximation by applying it to the values 
z= 2 and z = i/2. In the first case we obtain 
64 10916 
tan 2 = — —— = 1.10 7-14.8 

arc tan 105 6016 1.10598 ( 7) 

while the correct value is 1.10715. 
In the second case we obtain 
i 16i 67832 
arc tan 7 = 105 18817 = 0.54930673i (7-14.88) 

Now the significance of the arc tan z function for imaginary arguments 
iS 


arc tan ip = = = log = (7-14.89) 
Hence 
arc tan = 5 log 3 = 0.54930614i (7-14.90) 


What is the convergence of the approximation (85) as n grows to 
infinity? The error is proportional to 7, and if 7 converges to zero, 
the approximation converges unlimitedly to f(z). Now in our 
problem 


1 
T12) 
The Chebyshev polynomials T*(x) have the property that they grow 
to infinity as n converges to infinity, at any point x which is outside 


(7-14.91) 


§ 15 Estimation of the Error of the + Method 493 


the interval [0, 1]. Inside that interval they continue to be bounded 
by +1. Hence the convergence will hold at any point z which avoids 


the condition 


1 
(2224 (7-14.92) 


22 


This shows that the convergence holds everywhere in the complex 
plane, except along the imaginary axis. But even along the imaginary 
axis only the points beyond +i are excluded. At all points between 
—iand +i the convergence is still preserved. 

This is exactly the convergence behavior which can be predicted on 
theoretical grounds. The complex ray which connects the points 0 
and z must not contain any singular points. Hence a singular point 
will cast a shadow behind it which reaches out to infinity. This 
shadow is constructed by continuing the straight line which connects 
the singular point with the origin. The singular points of the arc tan z 
function are at z = +i. Hence the solution must become divergent 
at all points of the imaginary axis which lie beyond these points. 


15. Estimation of the error of the t method. In the numerical 
examples of the previous section the fast convergence of the 7 
expansions was demonstrated in a numerical way. It must be possible, 
however, to obtain theoretical estimates for the inherent error of 
these approximations. The relation of the Chebyshev polynomials to 
the trigonometric functions puts us in the position to develop a 
simple algebraic method for estimation of the error of the solution of 
a linear differential equation, obtained by the application of the 7 
method. 

We consider a differential equation of the form 


A(a)y’ + Bax)y + C(x) = TTi (2) (7-15.1) 


While the form of this equation seems very special, it actually covers 
a wide class of problems. Moreover, in the case of a differential 
equation of second or higher order it is advisable to introduce 
surplus functions and transform the given differential equation to 
a simultaneous system of equations of the form (1). Hence the 
method we are going to discuss actually has applicability to a larger 
group of problems which can be reduced to two or more equations 


494 Power Expansions Chap. VII 


of the type (1) and solved by the r method. For our present purposes 
it will suffice to consider only one single equation of the form (1). 

The right side of (1) has its origin in the fact that we wanted to 
solve the given differential equation by a finite expansion. Generally 
we may need more than one 7 term on the right side, but we can 
estimate the error for each r term separately and then take the 
absolute sum of these errors. For this reason it will suffice to carry 
through our investigation for one 7 term only. 

The exact solution of the given differential equation has no error 
term on the right side. 


A(x)y’ + Bay + C(x) = 0 (7-15.2) 


The difference between the correct solution y(x) and the approxima- 
tion y,(x) represents the error of the solution 


Nah) = Yæ) — Yn(2) (7-15.3) 
characterized by the differential equation 
ALINE) + BEM) = =T (2) (7-15.4) 


It will be our aim to find an approximate solution of this differential 
equation and thus obtain a close estimate of 7,,(7). 
We introduce the angle variable 0 as a new variable, replacing x by 


6 ; 
x = cos? z” dx = — ; sin 6 d0 (7-15.5) 


Then we obtain in the new variable a differential equation of the 
form 
An + Bin = —r? cos nb 


where 
2A cos? (0/2) 
= — — 7-15. 
A,(9) sin 9 (7-15.6) 
0 
B,(6) = B(cos* 4 (7-15.7) 


The right side is replaceable by —re™°? with the understanding that 
we are going to use the real part of the solution only. 


§ 15 Estimation of the Error of the t Method 495 


Now the equation | 
Am + Bin = —re'”® (7-15.8) 


assuming that A, and B, are constants, can be conceived as the 
differential equation of a ballistic galvanometer, put into forced 
vibrations by an external periodic force. The solution of (8) is 


-r in 

n(0) = ind, + B, e (7-15.9) 
It means that the galvanometer follows the external force, with 
changed amplitude and phase. In our case A, and B, are not constants 
but given functions of 0. Nevertheless, compared with the rapidly 
changing exponential function e”° they still behave nearly like 
constants. Hence we will not go far wrong and obtain a solution for 
”,(9), at least for sufficiently large n, if we utilize the solution (9) 
for our purposes, although we realize that this solution has now only 
approximate value. For the exact treatment we should put 


n = —G(8)re™ (7-15.10) 
and obtain for the changeable amplitude G(@) the differential 
equation 

A(G’ + inG) + B.G=1 (7-15.11) 
However, for sufficiently large n we may consider G’ negligible 
compared with inG. We then return to solution (9). 


We see that the error 7,,(0) of our solution will be periodic in 
character. If (9) is written in polar form, 


= 


= u(n8 — >) 5 
4,0) = JETER (7-15.12) 
with 
B, 
tan o = TÄ. (7-15.13) 
1 


we obtain, by taking the real part of (12), 


5 
N0) = — VB + mA cos (nô — ¢) (7-15.14) 


This is a harmonic vibration, with variable phase and amplitude. 


496 Power Expansions Chap. VII 


From the expression (14) several conclusions can be drawn. In 
the first place we want to know what will be the maximum error we 
may encounter at any point of the interval. For this purpose we 
investigate the quantity 


n*A*(x) 


x(1 — x) = 


B(x) + p(x) (7-15.15) 


and find its minimum within the interval [0, 1]. Let this minimum be 
$min: Then the maximum possible error Nmax is estimated by 


T 


n’ = | Deane | < $ l (7-15.16) 


Furthermore, the “‘method of forced vibration,” expressed in (9), 
will give a sufficiently close estimation of the residual 7(6) at any given 
point x = cos? (6/2). We come into difficulty, however, at the two 
end points x = 0 and x = 1 of the range. Here the denominator of 
A,(9) [cf. (6)] becomes zero, and thus A,(6) becomes infinite. The 
estimated error (9) will thus become zero. This indicates that the 
error at the points x = 0 and x= l is particularly small. But 
exactly at the end point x = 1 the error is of particular interest because 
in the parametric method the variable point z becomes the end point 
of the range. 

In many problems the difficulty exists only at the point x = 1 
since at the point x = 0 the function A(x) may become zero, because 
of the presence of the factor x. Then the error estimate (8) does not 
fail at the lower limit and requires readjustment only at the upper 
limit z = 1. 

We will now avoid the denominator of (6) by multiplying through 
by sin 6. Then the right side becomes 


—7 cos n6 sin 6 = — 5 [sin (n + 1)0 — sin (n — 1)0] 


and the same method which led to (9) now gives 


=r ei(n+1)6 4 
SO — -15.17 
Nn —2i(n + 1)A + Bsinð 2i ( ) 
z oiln—1)0 


T —2i(n — 1)A + Bsinð 2i 


§ 15 Estimation of the Error of the t Method 497 
At the point x = 1, i.e., 0 = 0, we obtain 


l l 


mall) = 5 (- 2(n + 1)A(1) a Xn — oral ee 


T 


~~ 2(n? — DAC) 


This formula shows that while the general order of magnitude of the 
error is r/n, at the end point of the range the order of magnitude 
drops down to 7/n?. 

In the light of these results we will once more examine the examples 
of the previous section. 

Example 1. Here 


A(z)=—1, B(r)=1 


The upper bound (15.16) for the error at any point of the range 
becomes 
0 [Ta] 


hh =a 
V4n® + 1 
while the estimated error at the end point x = 1 becomes 


T 


nl) = — Xn? —1) (7-15.19) 
Now the 7, of this problem are all positive, and thus it seems that we 
cannot explain the fact that the e values given in (14.8) approach the 
limit alternately from above and below. Formula (19) indicates that 
the error should remain permanently negative. The apparent 
discrepancy has the following reason. The error estimate (17) gives a 
definite error at the point x = 0, i.e., 9 = m. This error is (—1)”+! 
times the error at x = 1. 


N0) = (—1)"*"17,(1) (7-15.20) 


But in the solution (14.6) we have satisfied the boundary condition 
(14.3) exactly, and this means that we have no error at x = 0. For 
this reason we have to correct our estimated periodic error by a 
“systematic error,” caused by the fact that at the lower boundary we 
did not make allowance for the natural error which should exist at 
that point. 


498 Power Expansions Chap. VII 


In solving the differential equation (4) we did not take into account 
the solution of the homogeneous differential equation which can 
always be added, multiplied by an arbitrary constant C. Since in our 
problem the homogeneous differential equation characterizes the 
function to be obtained, i.e., e”, addition of this solution will change 
n,(1) by an additional term Ce, while 7,,(0) will change by C. The 
condition 7,,(0) = 0 determines C to 


C = (—1)"n,(1) 


and thus the resultant error at the point x = 1 becomes 


nl) = — [1 + (—1)”e] (7-15.21) 


Tn 
2(n? — 1) 
Since e is larger than 1, the sign of 7,,(1) will alternate, and the error 
will become negative for even n and positive for odd n, in accordance 
with the facts. The estimate (21) is very satisfactory from n = 3 on. 

Example 2. This example is purely algebraic. The differential 
operator is missing, with the consequence that the equation (4) 
becomes exactly solvable. Instead of an estimate of the error 7,,(%) 
we have an exact algebraic representation of the error. The com- 
parison with the error of the Taylor expansion has been discussed 
before and needs no further elaboration. 

Example 3. Here 


A(x) = 2", B(x) =1 +x 


The error becomes the largest at x = 0; the estimated maximum 
error is thus 


Nn =|Ta| 


while at the end point x = 1 the following estimate holds. 


r 
= — ~ 7-15.22 
The expression (14.47) for 7,, shows that 7, is negative for even n 
and positive for odd n. If we apply (22) to the successive approxima- 
tions (14.50) of y(1)—remembering that 7 „is associated with y ,_,(7)— 
we find that the estimate (22) becomes effective from n = 5 on. The 


§ 15 Estimation of the Error of the + Method 499 


alternate approach from below and above is not disturbed from that 
point on. 
Example 4. In this example 


A(x) = 22, B(x) = — 


and we obtain the estimated upper bound of the error at any point of 
the range. 


pm 
ia E| a 


and the error estimate at the end point. 


nA) = (7-15.23) 


a 1) 
The error at the point x = 1 is so small compared with the much 
larger error at x = 0 that we lose very little in accuracy if we put 
n,(1) = 0 and determine 7,, from the boundary condition y(1) = 1 
as we have done in (14.57). However, it is of interest to demonstrate 
the accuracy of the estimate (23) by taking it into account and 
obtaining 7 from the condition 


m m 
c T 
PUA EELA 7-15.24 
= È mi 4(n® — 1) oR) 
If we do so, the quantity 1/(7,,) becomes 
l 
= 7-15.25 
NT p = eT — DT +m an nla) 


According to the error discussion in § 14, this quantity should be 
theoretically equal to (—1)"*t'z. Earlier we have obtained for the 
case n = 6 (cf. 14.68) the value —3.1427, disregarding the first term 
on the right side of (25). But if this term is taken into account, we 
obtain the much closer value —3.141522. The accuracy has increased 
from 1 : 2800 to 1 : 45000. 

Example 5. In this example the parametric method was applied, 
obtaining y = f(z) in such fashion that z? is considered a mere 
constant of the given differential equation. Here 


A(x) = (1+ 222)22, Bir) = 1+ 22% 


500 Power Expansions Chap. VII 


The over-all accuracy within the range [0, 1]is here of no importance, 
since we are going to use the solution solely at the end point x = 1, 
which corresponds in the original variable to the point x = z. Hence 
we get 

T(z) 
4(n? — 11 + 23) 


This holds for the function u(z). For the arc tan z function we obtain, 
according to (14.70), 


7 (2) = 


27 (2) 
4(n? — 1)(1 + 2?) 
—z l 


~ A(n® — 1)(1 + 2) T*(—1/22) 


12) = (7-15.26) 


Let us apply this error estimate to the case z = 1, i.e., to the 7/4 
values of the table (14.81). The quantity 7,,(—1) alternates in sign 
being positive for even n and negative for odd n. Hence the successive 
convergents of 7/4 should approach the correct value from below for 
odd n and from above for even n. We find this verified from n = 3 on. 
On the other hand, the error of (14.87) is not well estimated by (26). 
If we put z = 2, the function A(z) is no longer sufficiently smooth to 
make the formula (26) applicable for n = 4. We would have to go to 
somewhat larger n for this purpose. Quite different, however, is the 
case of the substitution z = i/2. The surprisingly great accuracy of 
the approximation (14.88) is explained by the large value of T7(4). 


T7(4) = 18871 
Here the estimate (26) gives 


i 


et fe Ped . 10-6; 
0.18817 0.59 - 10-; 


naz) = 
in close agreement with the actual error. 


16. The square root of a complex number. The square root of a 
complex number is obtainable in the regular algebraic fashion, but 
the numerical procedure involved is quite laborious. The following 
method gives the answer much more rapidly. 


§ 16 The Square Root of a Complex Number 501 


Let the complex number whose square root is desired be denoted 
by A + Bi. If A is larger in magnitude than B, we put 


VA+ Bi=VAVI-+4 BIA (7-16.1) 
If, on the other hand, B is larger in magnitude than A, we put 
VA+Bi=VBVA+i (7-16.2) 


Hence our problem is reducible to the valuation of one of the follow- 
ing functions: 

y=Vitix (7-16.3) 
or 

y=Vi+ z (7-16.4) 


where z is limited to the interval [0, 1]. By taking out Vi as a factor, 
(4) is reducible to (3). We will agree that x is to vary only between 0 
and 1. Hence V1 — ix is not to be obtained by letting x become 
negative, but by changing i to —i. 

Our aim will be to obtain a quickly convergent expansion for the 
function (3) with the help of the 7 method. For this purpose we 
characterize the function (3) by the differential equation 


yY i 7 
IE (7-16.5) 


We can likewise say, however, that we solve the differential equation 


AE. 
y 1+7) 
or 
211 + xy’ —y=0 (7-16.6) 


along the imaginary axis. 

According to the standard procedure we put a 7 term on the right 
side of the differential equation (6). The polynomial T% (x) becomes 
automatically adjusted to the imaginary axis if we write it in the 
form 77 (z/i). It so happens, however, that in our problem the oper- 
ator y’ is dominant. Hence we obtain better results if we do not use 


502 Power Expansions Chap. VII 


the Tý polynomials directly, but their derivative, formulating our 
problem as follows. 


21 + ay’ —y=TTh 4 (z) (7-16.7) 


The Taylor expansion of our problem gives the ordinary binomial 
expansion with n = 4. 
E EE (7-16.8) 
NE E E E A Sal 
(x) + a + T: 
and the method of the Q polynomials shows that the partial sums of 
this expansion have to be weighted as follows. 


n= > —* — 5,(0) (7-16.9) 


ZA Imm — 1) 


Here c? are the coefficients of the polynomial which appears on the 
right side of (7). Moreover, a,, is the last coefficient of the partial 
sum S,,(z). Finally, since the variable x is supposed to move along 
the imaginary axis, x should be replaced by zji. 

As an example, let us choose n = 2. Then 


Ty (x) = —1 + 18r — 48x? + 3223 
T3 (x) = 18 — 96x + 96x? 


and T*’(x/i) = 18 + 96ix — 96x? 
Moreover : Q@=1, aq4=}, a=-—} 
Hence 


ix 8-96 ix a] 
= 7[238 + 192i + (—96 + 128i)x + 32x°] 
The factor 7 can be determined from the boundary condition 
y0) = 1 


1 
~ 238 + 192i 


which gives 


T 


§ 16 The Square Root of a Complex Number 503 


and the final result becomes 

V1 + ix = 1 + 0.0184a + 0.08102? + i(0.52012 — 0.065322) 
(7-16.10) 

= 1 + 2(0.0184 + 0.0810x) (0.002) 
+ ix(0.5201 — 0.0653x) (-L0.002i) 

The accuracy of this quadratic approximation is remarkably high, 

since the maximum etror at any point of the range [0, 1] does not 

surpass 2 units in the 3rd decimal. This accuracy will suffice for 

many problems. 

If we go one step further and deduce the corresponding formula 
for the case n = 3, we obtain the following cubic approximation, 
which is already accurate to 2 units of the fourth decimal place (in 
both the real and the imaginary parts). 


V1 + ix = 1 — 0.003162 + 0.142372? — 0.0407923 
+ i(0.50637~ — 0.03108x? — 0.02020z3) 
The quadratic approximation for the function Vi + x becomes 


Vi + x = 0.7071 + x(0.3807 + 0.01112) 
+ i[0.7071 — (0.3548 — 0.1035z)] 


For the sake of convenience we add two further formulas, in order to 
be prepared for any combination of signs in A and B. 


(7-16.11) 


(7-16.12) 


V/—1 + iz = 2(0.5201 — 0.06532) (7-16.13) 
+ ill + «(0.0184 + 0.0810z)] 
Vi — x = 0.7071 — 2(0.3548 — 0.10532) (7-16.14) 


+ i(0.7071 + (0.3807 +- 0.0111z)] 
As an example, let us evaluate i) 3 8. 
V—3 — 8i = V8 V—i — 0.375 
Here the formula (14) comes into operation, with x = 0.375 and 
changing i into —i. 
v —3 — 8i = 2.8284[0.7071 — 0.375 - 0.3150 
—i(0.7071 + 0.375 + 0.3849)] 
= 1.6659 — 2.4081i 


504 Power Expansions Chap. VII 


The correct result is 


V/ —3 — 8i = 1.66493 — 2.40250; 


The error is not more than 0.3%. 


17. Generalization of the t method. The method of selected points. 
The r method is directly applicable only if the given linear differential 
equation has coefficients which are polynomials of x. For more 
general equations we will not succeed with the construction of a 
proper error term on the right side of the equation which would 
allow a solution in form of a finite power series of strong convergence. 
However, we can reformulate the significance of the + method in a 
way which makes it applicable to a much wider class of problems. 
Let us return once more to the general idea of the 7 method. We 
made the linear differential equation 


Dy =0 (7-17.1) 


solvable by a finite power expansion through the device that we have 
put a properly chosen error term on the right side. Sometimes a term 
of the form 77;*(x) sufficed, but more generally we had to put 


Dy = TKT, + tot vt + Tma”) (7-17.2) 
By this device we have replaced y(x) by a certain 
Yo(%) = by + bye + + byt? (7-17.3) 


which closely approximated y(x) in the range [0, 1]. This procedure 
is limited, however, to linear differential equations with polynomial 
coefficients. 

We will now approach the problem from an entirely different 
viewpoint. Again we assume that y(x) shall be approximated by a 
finite polynomial of the order p. This puts at our disposal the p + 1 
constants 

bo, By, °°", Oy (7-17.4) 


which are uniquely determined by p + 1 independent equations. 
This shows immediately that we cannot hope to satisfy the given 
differential equation in any continuous range of the variable z. If we 
satisfy the given differential equation and the given boundary 


§ 17 Generalization of the t Method 505 


conditions in only p + 1 points, this is generally enough for unique 
determination of the constants b,, and any further conditions would 
lead to overdetermination. 

The question is now reduced to the proper selection of the p + 1 
points in which the given equation is to be satisfied. If initial values 
or boundary conditions are given, some of the b, coefficients are 
already absorbed by these conditions, and the number of free 
parameters is no longer p + 1, but p + 1 — v, where v is the number 
of the given boundary conditions. Our problem is thus reduced to 
the proper choice of n = p + 1 — v points. One possible choice 
would be to put all these n points exceedingly close to the origin, i.e., 
to satisfy the given equation at the z values 


x = 0, £, 26, +, (n E I)e 


where e converges to zero. The expansion thus obtained corresponds 
to the Taylor expansion of y = f(x), truncated to p + 1 terms. 

Quite a different distribution of points is suggested by equation (2). 
The right side of the differential equation (2) is zero at the zeros of 
the polynomial 7;7(x). These are the points 


1 + cos [(2k — 1)z/2n] 
Ly, = —— 
2 
If it is advisable to put 7%’, (Œ) rather than 7;*(z) on the right side, 


the zeros of the “Chebyshev polynomials of the second kind”? come 
into operation, characterized by the conditions 


a 1 + cos [kr/(n + 1)] 
— 2 


(k=1,2,:,n) (717.5) 


(k=1,2,=, n) (7-17.6) 


These points have a simple geometrical significance. We erect a 
semicircle with the center x = 4, and the radius 4 above the [0,1] line 
as diameter, divide it into n + 1 equal parts and project these points 
down on the base line. Excluding the two end points! we thus obtain 
n unequally spaced points which are spread over the entire interval 
instead of being crowded around the point x = 0. With this distribu- 
tion of the zeros, we will get a polynomial approximation whose 


1 We do not lose much in accuracy if we include the two end points, replacing 
n + 1 by n — 1 in the formula (17.6). The computational scheme is frequently 
simplified by inclusion of the two end points. 


506 Power Expansions Chap. VII 


error oscillates practically evenly over the given interval instead of 
yielding a very small error in the neighborhood of x = 0 but a large 
error in the neighborhood of x = 1. 

We now see that the inhomogeneous differential equation (2) can 
be solved in two different ways. One way is to solve the recurrence 
relations for the coefficients and thus determine the b; coefficients 
and the 7,. The other is to omit the 7, altogether, perform the differen- 
tial operator D on the finite expansion (3) and put the resulting 
expression equal to zero at all those points at which the right-hand 
side of the differential equation is zero. This, together with the 
boundary conditions, yields a linear system of algebraic equations 
for determination of the b,. The two methods give identical results, 
but the second method has more universal significance. It is applic- 
able to linear differential equations whose coefficients are not 
necessarily algebraic in x. It is likewise applicable to integral equa- 
tions or mixed differential-integral equations as long as they are 
linear. The general procedure is always the same. We replace y(x) by 
a finite polynomial expansion, perform the operations indicated by 
the given differential or integral operator on this expansion, and 
equate the result of the operation to zero, not for arbitrary values of x, 
which is impossible, but for a carefully selected discrete set of x 
values, which will guarantee a satisfactorily even distribution of the 
error over the given interval. What we obtain is a simultaneous 
system of ordinary linear algebraic equations for the b, from which 
these b, can be uniquely obtained. If the order p of the approximating 
polynomial is changed, the entire procedure has to be repeated 
and an entirely new set of p values obtained. 

This method of solving differential or integral equations by 
well-convergent power expansions is a generalization of the method 
of “trigonometric interpolation,” discussed before in IV, 16. If the 
functional values of y = f (x) are known, a close polynomial approxi- 
mation of f(x) can be obtained by fitting the functional values at the 
points (5) or (6). This amounts to an equidistant trigonometric 
interpolation of a function of 0 with the help of the Fourier cosine 
functions cos k0. If, however, the functional values of f(x) are not 
known but we possess the basic law which determines f(x) with the 
help of a linear operator, we can again obtain a well-convergent 
polynomial approximation of f(x) by interpolating the operator 
instead of the functional values. The procedure is in both cases 


§ 17 Generalization of the + Method 507 


similar, the only practical difference being that in the latter case we 
do not have the orthogonality of the resulting set of linear equations 
which would make them explicitly solvable. Since, however, in many 
practical applications the order of the approximating polynomial 
need not go beyond 4 or 5, the linear sets involved are of low order 
and can be solved by successive eliminations. We thus obtain an 
elegant numerical procedure which generalizes the benefits of the 
r method to a much wider class of problems.* 


1 For a practical example of this method of selected points cf. [5], p. 195. 


Bibliographical References 


[1] Bromwicu, TH. J. PA, Introduction to the Theory of Infinite 
Series (Macmillan, London, 1926). 


[2] JAHNKE, E., and EMDE, F., Tables of Functions with Formulae 
and Curves (Dover Publications, New York, 1945). 


[3] SzEG6, G., “Orthogonal Polynomials,” Am. Math. Soc. Colloq. 
Pub., 23, 1939. 


[4] Tables of the Chebyshev Polynomials S(x) and C,(x), AMS 9, 
National Bureau of Standards (1952). 


Articles 
[5] Lanczos, C., “Trigonometric Interpolation of Empirical and 
Analytical Functions,” J. Math. Phys., 17, 123 (1938). 


[6] Miter, J. C. P., “Two numerical applications of Chebyshev 
polynomials,” Roy. Soc. Edin. Proc. 62, 204 (1946). 


0.80 


Appendix 


NUMERICAL TABLES 


A(r) 


3.3588 
4732 
5928 
7179 
8490 
9862 

4.1299 
2805 
4383 
6038 


Table I 
Amplitude of Complex Roots (see p. 36) 
A(r) r | A(r) 
4.7774 1.00 | 7.0669 
9594 01 3644 
5.1505 02 6773 
3510 03 | 8.0062 
5615 04 3522 
7825 05 7161 
6.0146 06 | 9.0990 
2585 07 5019 
5147 08 9257 
7839 09 |10.3718 
Table I 


Transformation Matrices B,, (see p. 38) 


pi {iho bond, 


r 


0.90 


91 
92 
93 
94 
95 
96 
97 
98 
99 


(Negative numbers underlined) 


ma 


pi [LD (0S) 


509 


m Ww U 


1.10 


D © 


11 
12 
13 
14 
15 
16 
17 
18 
19 
20 


KR Z O N e 


(od. 


r | A(r) 


10.8412 
11.3353 

8553 
12.4028 

9790 
13.5857 
14.2243 

8967 
15.6046 
16.3499 
17,1345 


—_ © N Oo = 
-= N O N ie 
-= a a p m 


510 


B; 
1 1 1 
5 3 1 
10 2 2 
10 2 2 
5 3 1 
111 

B, 
111111 
753113 
21913 3 1 
35 55335 
35 5 5335 
2191331 
I3 3.2 7 3 
111111 
1 
9 
36 
84 
126 
126 
84 
36 
9 


Ia MO o = 


a O Ca 


Appendix 
Table II (Cont’d) 

1 1 11 
3 5 6 4 
2 10 15 5 
2 10 20 0 
3 5 15 5 
11 6 4 
1 1 

1 1 1 
7 8 6 
21 28 14 
35 56 14 
35 70 0 
21 56 14 
7 28 14 
l 8 6 
1 1 

B, 

1 1111 1 
53113 5 
80440 8 
08448 0 
1446 6 6 6 14 
1446 6 6 6 14 
08448 0 
80440 8 
53113 5 
1 1 1 1 1 1 


IS 
= If A Lh O Aa A IA 


— IN 
jæi 


N A O IN N N je 


= IN 


a Oo ^o O 


© 


Kn Ie 


Numerical Tables 


Table II (Cont’d) 

Bio 
1 1 1 1 1 1 1 
10 6 4 2 0 2 4 6 8 10 
45 27 3 3 5 3 3 13 22 45 
120 48 8 8 8 O 8 8 8 48 120 
210 42 14 14 2 10 2 14 14 42 210 
252 0 28 O 12 0 12 O 28 0 252 
210 42 14 14 2 10 2 14 14 42 210 
120 48 8 8 8 O 8 8 8 48 120 
45 27 13 3 3 5 3 3 13 27 45 
10 8 6 42 0 2 4 6 8 10 
111141 1 

By 
1 1 1 1 1 1 1 1 1 1 1 i 
1 9 7 53 1 13579 Si 
55 35 19 7 1 5 5 1 7 19 35 55 
165 75 21 5 11 5 5 11 5 21 75 165 
330 90 22 6 10 10 6 22 6 90 330 
462 42 42 14 14 10 10 14 14 42 42 462 
462 42 42 14 14 10 10 14 14 42 42 462 
330 90 6 22 6 10 10 6 22 6 90 330 
165 75 21 5 1 5 5 11 5 21 75 165 
55 35 19 7 1 5 5 1 71935 55 
u 9 7 5 3 1 1 3 5 79 Th 
1 1 1i 1 1 1 1 1 1 1 1 1 


511 


512 Appendix 
Table II (Cont’d) 


B,: 

1 1 1 1 21 1 1 1 1 1 1 1 1 

12 10 8 6 4 2 0 2 4 6 8 10 12 

66 44 26 12 2 4 6 4 2 12 26 44 66 

220 110 40 2 12 10 O 10 12 2 40 110 220 

495 165 15 27 17 5 15 5 17 27 15 165 495 

792 132 48 36 8 20 0 20 8 36 48 132 792 

924 0 84 O 28 20 0 28 O 84 O 924 

792 132 48 36 8 20 O0 20 8 36 48 132 792 

495 165 15 27 17 5 15 5 17 27 15 165 495 

220 110 40 2 12 10 O 10 12 2 40 110 220 

66 44 26 12 2 4 6 4 2 12 26 44 66 

12 10 8 6 4 2 0 2 4 6 8 10 12 

1 1 1 £ 1 1 1 1 1 1 1 1 

p Table III 

Inversion of Eigenvalues (see p. 203). Smallest value of v: v, = Vv 3r = 0.5513288953. 
For very small u: u = uf Vv — v, where ut = V5V3/r = 1.660314572. The tabular u* 


between v = 0.56 and v = 1 has to be multiplied by Vv — v9. Hence u? = u*?(v — 0.55133). 
Beyond v = 1 the table gives the value of u directly. 


v | u v | ut v | ut v u* 


57 6485 68 5898 79 5457 90 
58 6425 69 5853 80 5427 91 
59 6366 70 5810 81 5395 92 
60 6309 71 5767 82 5363 93 
61 6253 72 5724 83 5331 94 
62 6198 73 5684 84 5301 95 
63 6145 74 5644 85 5272 96 
64 6093 75 5606 86 5243 97 
65 6043 76 5568 87 5216 98 


0.56 | 1.6548 0.67 | 1.5945 0.78 | 1.5489 0.89 | 1.5163 
66 5993 77 5527 88 5189 99 
| 


04 0393 04 | 0397 04 0398 04 0398 
06 0584 06 | 0592 06 0595 06 0596 
08 0772 08 0785 08 0790 08 0793 


Numerical Tables 


Table III (Cont'd) 


2.0978 
1169 
1359 
1548 
1737 
1925 
2113 
2300 
2488 
2676 
2863 
3052 
3241 
3431 
3622 
3813 
4007 
4201 
4399 
4595 
4794 
4995 
5198 
5402 
5607 
5814 
6023 


513 


Appendix 


Table IV 


Coefficients of the First 15 Shifted Legendre Polynomials P% (x) (see p. 287). 


514 

i= 

n= 1 
n=2 
n= 3 
n=4 
n=5 
n= 6 
n=7 
n= 8 
n=9 
n= 10: 
n= 11 
n= 12: 
n= 13: 
n= 14: 
n= 15: 


(Negative numbers underlined; sequence: lowest to highest power; 
e.g., P3(x) = 1 — 12% + 30x? — 202) 


1, 12, 30, 20 
1, 20, 90, 140, 70 

1, 30, 210, 560, 630, 252 

1, 42, 420, 1680, 3150, 2772, 924 

1, 56, 756, 4200, 11550, 16632, 12012, 3432 

1, 72, 1260, 9240, 34650, 72072, 84084, 51480, 12870 

1, 90, 1980, 18480, 90090, 252252, 420420, 411840, 218790, 48620 

1, 110, 2970, 34320, 210210, 756756, 1681680, 2333760, 1969110, 923780, 
184756 

1, 132, 4290, 60060, 450450, 2018016, 5717712, 10501920, 12471030, 9237800, 
3879876, 705432 

1, 156, 6006, 100100, 900900, 4900896, 17153136, 39907296, 62355150, 64664600, 
42678636, 16224936, 2704156 

1, 182, 8190, 160160, 1701700, 11027016, 46558512, 133024320, 261891630, 
355655300, 327202876, 194699232, 97603900, 10400600 

1, 210, 10920, 247520, 3063060, 23279256, 116396280, 399072960, 960269310, 
1636014380, 1963217256, 1622493600, 878850700, 280816200, 40116600 

1, 240, 14280, 371280, 5290740, 46558512, 271591320, 1097450640, 
3155170590, 6544057520, 9816086280, 10546208400, 7909656300, 3931426800, 
1163381400, 155117520 


Numerical Tables 515 


Table V 
The First 12 Chebyshev Polynomials T,,(x) (see p. 455) 


T,(x) = 1 (but To = 4; see p. 457) 

Ti) = 

T(x) = 22%— 1 

T(x) = 4x? — 3x 

T,(x) = 8zt— 8r? + 1 

T(x) = 16x5 — 20x? + Sz 

T(x) = 32x — 4824+ 182?— 1 

T(x) = 64x7 — 11225 + 56x? — Tx 

Ta(£) = 1288 — 256° + 160x — 32x? + 1 

Ta(£) = 256x? — 57627 + 432x15 — 120x? + 9x 

Tio(x£) = 512x10 — 1280x8 + 1120x — 400x + 50x*— 1 

T,,(z) = 1024x! — 28162° + 2816r? — 1232x + 220x? — 1lx 
Tia(£) = 2048x112? — 6144x! + 6912x8 — 3584x! + 840r — 72x? + 1 


Table VI 
The First 12 Powers }(2x)" Expressed in Chebyshev Polynomials T,(x); (with T, = 4) 


To Tz T, Te Ts, To Te 


0: 1 
= 2: 2 1 
= 4; 6 4 1 
n= 6: 20 15 6 
8: 70 56 28 8 1 


n= 10: 252 210 120 45 10 1 
n = 12: 924 792 495 220 66 12 1 


(For example: 32zx* = 207, + 157, + 6T, + Te) 


Tı Ts; T; T: To Tı 1 


n=1: 1 

n= 3: 3 

n = 5: 10 5 1 

n=71: 35 21 7 1 
n=9: 126 84 36 9 1 


n= 11: 462 330 165 55 11 1 


(For example: 256z° = 1267, + 847, + 36T; + 9T, + Ts) 


516 Appendix 


Table VII 
Coefficients of the First 12 Shifted Chebyshev Polynomials T'*(x) (see p. 456). 


(Negative numbers underlined; sequence: lowest to highest power; 
e.g., T3(x) = —1 + 18x — 482? + 3223) 
T(x) = 1 (but T = $; see p. 457) 
n=1: 
n=2: 1, 8,8 
n=3: 1, 18, 48, 32 
n=4: 1, 32, 160, 256, 128 
n=5: 1, 50, 400, 1120, 1280, 512 
n=6: 1, 72, 840, 3584, 6912, 6144, 2048 
n=7: 1, 98, 1568, 9408, 26880, 39424, 28672, 8192 
n=8: 1, 128, 2688, 21504, 84480, 180224, 212992, 131072, 32768 


n = 9: 1, 162, 4320, 44352, 228096, 658944, 1118208, 1105920, 589824, 131072 
n = 10: 1, 200, 6600, 84480, 549120, 2050048, 4659200, 6553600, 5570560, 2621440, 
524288 


n = 11: 1, 242, 9680, 151008, 1208064, 5637632, 16400384, 30638080, 36765696, 
27394048, 11534336, 2097152 

n = 12: 1, 288, 13728, 256256, 2471040, 14057472, 50692096, 120324096, 190513152, 
199229440, 132120576, 50331648, 8388608 


n=0 
n=1 
= 2: 
n= 3 
n=4 
n == 
n= 6 
= 7: 
= 8: 
n= 
n= 10 
n= 11 
n= 12 
n=7 
n=8 
n=9; 
n= 10 
n= Íl 
n= 12 


924 
3432 
12870 
48620 
184756 
705432 
2704156 


T7 


153 
1140 
7315 

42504 


3003 
11440 
43758 

167960 
646646 
2496144 


TS 


18 
190 
1540 
10626 


Numerical Tables 


Table VIII 
The First 12 Powers 4(4x)" Expressed in the Shifted Chebyshev Polynomials T x(x); 


31824 
125970 
497420 

1961256 


Tř 


1 

20 
231 
2024 


(with Tò = 4), see p. 461 


220 
1001 
4368 

18564 
77520 
319770 
1307504 


* 
Tio 


1 
22 
276 


Ti 


364 
1820 
8568 

38760 
170544 
735471 


* 
Th 


1 
24 


TS 


560 
3060 
15504 
74613 
346104 


Ti 


1 


517 


T* 


120 
816 
4845 
26334 
134596 


(For example: 32768z° = 12870T$ + 1144077 + 800877 + 4368Tf + 18207} 
+ 560T* + 120T* + 167* + T*) 


518 


Appendix 


Table IX 


Coefficients of the First 13 Shifted Chebyshev Polynomials of the Second Kind, U*(x) 
(see p. 289). (Negative numbers underlined; sequence: lowest to highest power; 


n=0 
n= 1 
n= 2: 
n= 3: 
n= 4: 
n= 

n= 6: 
= 7: 
n = 8 
n= 

n= 10: 
n= 11: 
n= 12: 
n = 13: 


e.g., U (2) = —4 + 40x — 96x? + 6423) 


2,4 
3, 16, 16 

4, 40, 96, 64 

5, 80, 336, 512, 256 

6, 140, 896, 2304, 2560, 1024 

7, 224, 2016, 7680, 14080, 12288, 4096 

8, 336, 4032, 21120, 56320, 79872, 57344, 16384 

9, 480, 7392, 50688, 183040, 372736, 430080, 262144, 65536 

10, 660, 12672, 109824, 512512, 1397760, 2293760, 2228224, 1179648, 262144 
11, 880, 20592, 219648, 1281280, 4472832, 9748480, 13369344, 11206656, 
5242880, 1048576 

12, 1144, 32032, 411840, 2928640, 12673024, 35094528, 63504384, 74711040, 
55050240, 23068672, 4194304 

13, 1456, 48048, 732160, 6223360, 32587776, 111132672, 254017536, 392232960, 
403701760, 265289728, 100663296, 16777216 

14, 1820, 69888, 1244672, 12446720, 77395968, 317521920, 889061376, 
1725825024, 2321285120, 2122317824, 1258291200, 436207616, 67108864 


Numerical Tables 519 


Table X 


Coefficients of the First 11 Laguerre Polynomials L,(x) (see p. 298). (Negative numbers 
underlined; sequence: lowest to highest power; e.g., La(£) = 6 — 18x + 9x? — x’) 


n= 0: 1 


n= 2: 2, 4,1 
n=3: 6, 18,9, 1 


n=4: 24, 96, 72, 16, 1 
n=5: 120, 600, 600, 200, 25, 1 
n=6: 720, 4320, 5400, 2400, 450, 36, 1 


n=7: 5040, 35280, 52920, 29400, 7350, 882, 49, 1 
n=8: 40320, 322560, 564480, 376320, 117600, 18816, 1568, 64, 1 


n=9;: 362880, 3265920, 6531840, 5080320, 1905120, 381024, 42336, 2592, 81, 1 
n= 10: 3628800, 36288000, 81648000, 72576000, 31752000, 7620480, 1058400, 86400, 
4050, 100, 1 


n=11: 39916800, 439084800, 1097712000, 1097712000, 548856000, 153679680, 
25613280, 2613600, 163350, 6050, 121, 1 


Values of the Normalized Laguerre Functions 
The following table of the values of the normalized Laguerre functions (in double arguments) 


Pal) = — L,(2z) (n < 20; see p. 298) 
n! 


was computed under the auspices of the National Bureau of Standards, Washington, D.C., 
(in the original to ten decimal places) and is published by permission of the Bureau. 
(Negative numbers are underlined.) Recurrence relation: 


1 
Pn+1(£) = —— [(2n + 1 — 2x)—p,(%) — NPn-(2)] 
n+l 


520 


COIDWNAWNHKO/]F 


COrIDWNWAWNHK ojs 


8 


8 


© 


|! 
t 
Ur 


.08208 


U Juv 
ju 
OO |e 
Oo jN 
© 100 


D 
© 
N 
N 
N 


Ww 
rA 
N 
Un 


Ww 
N 
oo 
GN 
— 


| 


N 
oo 
Nn 
ESS 
— 


N 
WwW 
N 
N 
Ww 


n 
co 
WC 
O 
A 


— 
Ww 
WwW 
Nm 
— 


| 


: 


© 
N 
oO 
oo 
N 


Appendix 
Table X (Cont'd) 
03 | 04 
.74082 | .67032 
29633 | 13406 
01482 | 18769 
21928 | 35214 
33974 | 40505 
39534 | 38257 
213 | 31283 
37349 | 21730 
32042 | 11198 
25188 841 
17508 | 08548 
572 | 16461 
01818 | 22618 
05423 | 26909 
11915 | 29356 
17497 | 30074 
22074 | 29244 
25603 | 27086 
28082 | 23844 
29544 | 19769 
30047 | 15107 


0.5 
.60653 


Numerical Tables 521 


Table X (Cont’d) 


n 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 


18| 07222 | 14610 | 08719 | 04037 | 13118 | 12442 | 03497 
19| 14360 | 09254 | 04280 | 13433 | 11240 | 00595 | 10024 
20; 11090 | 02774 | 13205 | 10942 660 | 11226 | 12846 


+ 
t 
4 


0 

1 00002 | 00001 
2} 00065 | 00028 | 00012 
3} 00450 | 00213 | 00099 
4; 02090 | 01099 | 00561 
5} 06747 | 04006 | 02276 
6| 15126 | 10433 | 06740 
7) 22308 | 18924 | 14417 
8| 17438 | 21622 | 21135 
9} 02391 | 09606 | 17713 
10| 17368 | 10815 | 00463 
11| 05721 15615 | 15724 
12; 14490 | 03407 | 09596 
13| 06395 | 15200 | 10823 
14) 13912 | 02078 | 11230 
15} 03187 | 14048 | 09353 
16| 14039 | 04582 | 09944 
17; 02781 11874 | 10558 
18| 11868 | 08945 | 06459 
19| 09506 | 07012 | 12382 
20| 05096 | 12355 | 00564 


nN ojo l 
RSeSleEseslaSalsalesles 
NAO Olw BRN LV S OCIS Shy 
A Ww CO u | |\O ~] CO 160 CO O Ajah WI a 
WwW A m je j N NO QD Olu oo lo NIN DOM a 
mt — NO č 
Si E RII S REIS Se glg Sis Sis Ss 
Gis 8 Ses 2 SBS Ik SE SR ORS SS 
Ni A WIGOiIN OO A inmiw AJA jua iu QDiIo nN © 
NO 
D 
[NO] 
—" 
fon 
ht 
O 
N 
O 
£ 
— 
È 
iw) 
N 
CO 
oo 
— 


522 Appendix 


Table XI 
Conversion of a Harmonic Series into a Polynomial Expansion 
The series 


N m N 
f£) = È acos (2k + 1)~ 2+ > bysin kra 
k=0 2 k=1 


is transformed into the series 
(r 0) 
f (x) = 2 ET 
= 


(neglecting all c; > 13) by multiplying the a, by the successive columns of the first matrix, 
the 5, by the successive columns of the second matrix; see p. 351. (Negative numbers 
underlined.) (This table was prepared under the auspices of the National Bureau of 
Standards, Washington, D.C., and is published by permission of the Bureau; N is limited 
to <11.) 


k Co Ce C4 Ce Cg Cio Cie 

0 .472001 .499403 .027992 .000597 .000007 .000000 .000000 
1 265857 292636 740869 205626 024909 001735 000079 
2 204268 300940 138934 691873 418338 107419 016207 
3 171971 279881 032140 402029 451227 560378 242595 
4 151323 259041 097117 211630 465982 121520 567021 
5 136676 241238 125502 101019 361305 347427 181560 
6 125594 226275 138516 034324 262471 374524 116288 
Ti 116832 213613 144229 007805 185551 337894 256291 
8 109679 202772 146188 035480 127870 286247 303192 
9 103697 193378 146117 054225 084651 235836 304120 
10 098598 185148 144919 067216 051946 191405 284458 
11 094184 177867 143093 076366 026880 153695 256880 
k Cy C3 C5 Cy H Cy Ci Ci3 

l .569231 .666917 .104282 .006841 .000250 .000006 .000000 
2 424765 058224 745649 315042 058248 006296 000453 
3 353450 167800 294175 589723 503360 170328 033705 
4 309062 193132 097432 459239 290219 582676 318085 
5 278050 197132 003055 301849 426211 040189 513869 
6 254818 194304 047027 189771 385862 240016 291674 
7 236577 189152 075527 113651 313247 329561 010673 
8 221764 183315 092486 061327 244663 334392 162549 
9 209424 177443 102825 024535 187474 306725 247246 
10 198938 171797 109159 001931 141479 269136 278343 
11 189884 166476 112969 021356 104801 280138 


Numerical Tables 523 


Table XII 
Curve Fitting of Equidistant Data. 


The given 2n + 1 equidistant ordinates y,[k = —n, —(n — 1), +*+ ,(n — 1), n] are 
fitted by the infinite expansion 


fe) = 2 aT) 


which can be truncated at any suitably chosen i = v (see p. 351). The last two ordinates are 
reduced to zero (see p. 333). The remaining 2” — 1 ordinates are divided into the two 
groups: 

Ue = Yr + Y-k (kK = 0,1, 2,°°-+,2 — 1) 
and 


Un = Yk — Y-k (k = 1, 2,3,---,n — 1) 


The even coefficients c,, are evaluated by multiplying the u, by the successive columns of one 
of the first group of tables, according to the number of data points (the table extends from 
5 to 25 data points, i.e., n = 2 to n = 12; negative numbers are underlined). The odd 
coefficients Czm41 are similarly evaluated by multiplying the v, by the successive columns of 
one of the second group of tables. (These tables were computed under the auspices of the 
National Bureau of Standards and are published by permission of the Bureau.) 


Even C; 


051536 
260872 


.068402 
077288 
201331 


.029805 
103761 
063926 
170194 


.038976 
049960 
087062 
055556 
150219 


Co 


Ce 


.192215 


252040 


.104988 


048188 
265447 


074723 


098060 
107141 


197455 


.069490 


077724 
004228 
155161 
134730 


C4 


199554 
046870 


n=4 


.062008 


046475 
229621 
150828 


n= 5 


.070770 


031293 
091897 
140391 
183270 


Ce 


.006229 


008804 


.073876 


120762 
061421 


.111810 


141858 
001412 
047702 


.042850 


038520 
188491 
135871 
022172 


079105 
088787 
036562 
091975 
061958 


.000020 


000028 


.002715 


004679 
002675 


.032360 


057575 
040006 
019484 


.082590 


136363 
073506 
020528 
000980 


524 Appendix 


Table XII (Cont'd) 


Even C; 
k Co Co C4 C6 Cg Cio Cie 
n=6 
0 .021091 062765 047450 .050557 .065817 .036969 053695 
1 065054 081523 095971 062839 001731 022276 047351 
2 043738 109942 010618 056330 073360 146535 063561 
3 076558 033860 045192 116235 096555 108443 114892 
4 049724 054366 162047 053758 176085 006155 088080 
5 135991 147385 086438 181196 078732 043817 038201 
m= 
0 .027049 037636 .05066 045786 .037667 058439 037718 
1 037099 104213 066741 059319 053798 014038 002342 
2 057015 062020 049309 015222 081722 063230 107341 
3 039324 083701 033169 089660 072997 061509 078444 
4 069167 013397 067635 096508 007112 150629 029117 
5 045377 033142 153880 008840 161891 073419 079610 
6 125186 150928 050308 165515 113337 007891 052879 
n = 
0 .016366 046282 035231 040550 .044555 .030016 049022 
1 047298 064460 079634 057019 033449 044969 013504 
2 033458 085966 033788 006173 025525 088291 061550 
3 051398 046093 013834 059517 096981 042031 031191 
4 035991 064276 058885 089520 032817 018363 119100 
5 063597 000980 075649 065946 048602 125873 060961 
6 041982 018552 140763 050034 127054 119952 032144 
7 116615 151489 023181 145570 131003 027599 045997 
n = 
0 .020641 029875 039438 034074 .032501 .042584 .026731 
1 029584 080574 055967 058507 050576 016780 034104 
2 042566 053695 053968 013507 022049 029715 083680 
3 030690 071033 006341 030837 061859 089196 024389 
4 047188 033442 010528 076840 074948 010331 034580 
5 033365 049612 072851 074385 008558 056361 087685 
6 059201 011327 076189 036770 075088 078357 110501 
7 039239 008150 126688 075785 088391 137623 020381 
8 109600 150469 002571 125318 137505 056487 028018 


Co 


Ce 


C4 


we 
CU MAANINAMN PWN — OS OmArANURWN SK O 


— OO MAANINUNURPWNK OS 


Numerical Tables 525 


Table XII (Cont’d) 


Even C, 


013392 .036556 .028188 .033378  .033483 .026533 .039264 
037102 053135 066292 049638 041173 045205 008011 
027181 069712 037353 026308 000860 038969 032977 
039031 044121 031191 020473 054546 054572 074808 
028497 058906 014235 050479 064660 052097 034002 
043881 023409 026439 077959 041281 051607 052337 
031230 038319 079522 054432 038873 060942 036059 
055617 018929 072962 012561 082404 031420 121822 
036965 000518 113214 091063 052829 136250 062863 
103719 148599 013294 106345 137101 077852 006458 


016656 .024817 .032213 .027288  .028078 .032821 .022764 
024638 065480 047686 052993 045107 027805 039226 
033986 046377 050633 023412 004566 006902 047221 
025270 060263 019820 001567 034004 064401 045676 
036261 035906 012886 042138 063299 042768 029841 
026706 049020 028905 057472 050171 009798 069210 
041192 015398 036391 070649 009167 071840 036492 
029451 029462 081739 034481 057001 047272 007926 
052619 024610 067949 006292 078671 006239 109018 
035040 005211 100895 099390 022613 124033 091632 
098695 146280 025659 089188 132603 092591 014586 


011344 .030160 .023566 .028196 .026858 .023682 .031571 
030490 045167 056270 043336 041143 041494 018535 
022928 058339 036051 032318 013014 011452 012403 
031550 040077 035516 000623 024575 038615 065383 
023704 052155 004974 021530 050563 060969 021701 
034014 028939 001160 053580 056613 016386 012406 
025208 040898 039004 056505 029756 022671 075513 
038950 008945 042256 059622 016473 074394 007693 
027940 022402 081269 016535 065292 026587 035796 
050064 028913 062210 020439 069251 033045 084605 
033386 009591 189860 103146 002024 106515 107898 
094339 143741 035405 073931 125799 102028 033242 


Co Ce C4 Ce Cs Cio Cie 


526 Appendix 


Table XII (Cont’d) 
Odd c; 


i= 2 
.284615 f .052141 .003420 .000125 


n=3 
.041704 : .245354 .092920 .016887 
286942 185147 088970 016743 


.056917 : .152844 ; .103589 
053945 099615 125778 
269300 219981 074465 


.017020 .109589 : .140971 
075576 123542 107482 
057871 010771 
251275 050578 


.023596 . : .098694 
025328 028072 
082305 154887 
058729 095037 
235434 001900 


009284 : .093361 
035218 037615 
029738 057667 
084336 107540 
058302 143562 
221847 048326 


012775 l ; ; .069469 
014824 068921 
041416 036860 
032157 077009 
084285 042814 
057311 150825 
210175 021382 


Cı Cy 


a 


CONNNRWN =e 


1 
2 
3 
4 
5 
6 
7 
8 
9 


SCUWUMAONNMN PWN 


005862 


020347 
018311 
044822 
033467 
083256 
056078 
200067 


007971 


009769 
025078 
020577 
046668 
034123 
081767 
054759 
191231 


.004044 


013231 
012471 
028127 
022074 
047592 
034379 
080075 
053432 
183434 


Numerical Tables 


Table XII (Cont’d) 
Odd c; 


.061780 


061726 
012115 
075101 
058788 
009673 
136128 
102908 


.049078 


065347 
022419 
032006 
080355 
028284 
044280 
112049 
115130 


.043136 


057319 
038800 
015812 
054054 
066528 
001351 
063510 
085511 
120867 


527 


528 


= O VO OANA NA RWN — 


foe eek 


(Reference: Applied Math. Series 37, U.S. Dept. of Commerce, National Bureau of 


005433 
006939 
016834 
014386 
030123 


023066 
047949 
034381 
078315 
052134 
176496 


Cy 


C3 


Appendix 


Table XII (Cont’d) 


Odd c; 


022202 
038677 
054216 
053029 
052570 
025811 
008081 
042940 
059704 
112116 
023715 


C5 


Table XIII 


035916 
054521 
040515 
006439 
041103 
058229 
044854 
025028 
071537 
059959 
122242 


Zeros and Weights of Gaussian Quadrature (see p. 396) 


Standards, Washington, D.C., p. 187). 


0.5773502692 


0.7745966692 


0.3399810435 
0.8611363116 


0 


0.5384693101 
0.9061798459 


0.2386191861 
0.6612093865 
0.9324695142 


0 
0.4058451514 
0.7415311856 
0.9491079123 


0.8888888889 
0.5555555556 


0.6521451549 
0.3478548451 


0.5688888889 
0.4786286705 
0.2369268851 


0.4679139346 
0.3607615730 
0.1713244924 


0.4179591837 
0.3818300505 
0.2797053915 
0.1294849662 


Numerical Tables 


Table XIII (Cont’d) 


0.1834346425 
0.5255324099 
0.7966664774 
0.9602898565 


0 
0.3242534234 
0.6133714327 
0.8360311073 
0.968 1602395 


0.3626837834 
0.3137066459 
0.2238103445 
0.1012285363 


0.3302393550 
0.3123470770 
0.2606106964 
0.180648 1607 
0.0812743884 


529 


Table XIV 


Gaussian Quadrature with Rounded-off Zeros (see p. 396) 
(This table was computed under the auspices of the National Bureau of Standards and is 
published by permission of the Bureau.) 


+2; Wi 
0.58 1 
0 0.8755832912 
0.77 0.5622083544 
0.34 0.6510683761 
0.86 0.3489316239 
0 0.5652007082 
0.54 0.48601 17674 
0.91 0.2313878782 
0.24 0.4694282398 
0.66 0.3554559718 
0.93 0.1751157884 
0 0.4087667652 
0.40 0.3816431469 
0.74 0.2855469914 
0.95 0.1284264790 
0 0.3274777490 
0.32 0.3073797883 
0.61 0.2682840654 
0.84 0.1834544443 
0.97 0.0771428275 


530 


Appendix 


Table XV 
Quadrature in Terms of End-data (see p. 423) 


Range [0, 1]. The numerical coefficients are multiplied by the derivatives at the end-points 
(symmetrized for the even, anti-symmetrized for the odd derivatives), starting with the 
lowest order; e.g., the third convergent becomes: 


1 3 1 1 
As =5 0 + fl + SUPO -rO + 4 © + f* + a yO- ff’) 


n= 0: 
n= 1 
n=2 
n= 3 
n= 4: 
n= 5 
n= 6 
n=7 
n= 8 


1680 


120 
840, 180, 20, 1 
1680 
15120, 3360, 420, 30, 1 
30240 
332640, 75600, 10080, 840, 42, 1 
665280 


8648640, 1995840, 277200, 25200, 1512, 56, 1 


17297280 


259459200, 60540480, 8648640, 831600, 55440, 2520, 72, 1 


518918400 


8821612800, 2075673600, 302702400, 30270240, 2162160, 
110880, 3960, 90, 1 (denominator: 17643225600) 


INDEX 


A 
ABSOLUTELY integrable, 210, 254 
Adjoint: 
boundary conditions, 363 
system, 98 
vectors, 102 
Algebra: 


fundamental theorem, 4 
postulates, 57 
Algebraic equations, 5-48 
Algorithm: 
Horner’s, 12, 21 
movable strip, 13 
progressive, 175 
Amplitude factor, 444 
Analysis: 
hidden periodicities, 267 
numerical, 4 
stability, 44 
Analytical, 207, 264, 281, 332 
continuation, 441 
Angular frequency, 255, 268 
Apollonius of Perga, 81, 82 
Archimedes, 305, 379 
Arc tan expansion, 492 
Arithmetic mean method, 211, 225, 451 
Asymptotic: 
expansion, 483, 484 
law, frequency response, 261 
relation, Chebyshev coefficients, 489, 
499 
Auxiliary reference system, 125 
Average error, 213, 316 


B 
BALLISTIC: 
galvanometer, 495 
network, 301, 303 
Base vectors, 90, 97, 101, 111, 114 
Berkeley, 171 


Bernoulli numbers, 388 
Bernoulli’s method, 22 
Bessel functions, 350, 351 
Bessel’s: 
differential equation, 464, 467 
interpolation formula, 311 
Best approximation, 158, 230, 316, 
448 
Binomial weighting, 443, 461 
Biorthogonality, 74, 78, 104 
Block diagram, 283 
Borel, 306, 348 
Boundary: 
conditions, 363 
values, quadrature, 428 
Bounded variation, 208, 257, 373 


C 


CANONICAL polynomials, 469 
examples, 470, 471, 475, 483, 486, 
490 
Carrier wave, 220 
Cauchy, 6 
Cayley, 50, 63, 67 
Central differences, 309 
Characteristic: 
equation, 6, 58, 59, 61, 63 
roots, 6 (see also Eigenvalues) 
Chart, harmonic analysis, 250 
Chebyshev: 
coefficients as weight factors, 472 
expansion, convergence, 453, 481 
polynomials, 245, 369, 440, 454, 455 
recurrence relation, 178, 179, 479 
second kind, 194, 289, 370, 477 
shifted, 8, 179, 457, 460, 468, 493 
Coded matrix, 236 
Codiagonal system, 138 
Codibility, 123 
Coefficients, flexible, 449 


532 Index 

Compatibility : Difference: 
condition, 152 coefficient, 307 
and noise, 149 table, 308 


Complete function system, 362 
Complex: 
Fourier series, 215 
matrix, inverted, 137 
number in polynomial, 16 
range, 447, 474, 492 
roots, 17, 29, 34 
Condition: 
compatibility, 152 
convergence, 210, 357, 436 
Conditions, Dirichlet, 208 
Conformal mapping, 37, 293, 442 
Continuous spectrum, 252 
Contravariant, 101 
Convergence: 
and noise, 287, 332 
Chebyshev expansion, 453 
polynomial interpolation, 357 
quadrature formula, 434 
Taylor series, 353, 440, 447 
Convergent expansion, flexible coeffi- 
cients, 469 
Covariant, 101 
Cubic equation, 6, 8, 11 
Curly D process, 220 
Cutoff frequency, 334 
empirical, 336 


D 


DATA: 

analysis, 305-378 

equidistant, 348 

nonequidistant, 372 

smoothed, fourth differences, 316 

Fourier series, 337 

Decay constant, 272 
Defective matrix, 110 
Deficient linear combinations, 169 
Delta function: 

convergent expansion, 222 

power approximation, 193 
Deorthogonalization, 127 
Derivative: 

by central differences, 313 

by simple differences, 312 

half line, 313, 323 
Determinant, 58, 59, 68, 71, 134, 167 
Diagonal matrix, 75, 105, 118 


central, 309 
errors magnified, 313 

Differences, negative order, 326 
Differential equation: 

Bessel, 464, 467 

hypergeometric series, 367 
Differential operators, self-adjoint, 362 
Differentiation: 

empirical function, 321 

Fourier series, 219 

by integration, 324 
Digital computers, 123 
Dirac’s delta function, 222, 262 
Dirichlet: 

conditions, 208 

kernel, 209 
Discrete spectrum, 254, 269 
Distribution of residuals, 316 
Divergence made convergent, 482 
Division: 

in matrix algebra, 64 

synthetic, 24, 295 
Dot-dot-dot writing, 448 
Double roots, 404, 410 
Duality, adjoint set, 98 
Dual representation, 101 


E 


e-APPROXIMATIONS, 426, 476 
Economical distribution of errors, 453 
Economization, 452 
Eigenvalue, 59, 65, 88, 89 
analysis: 
example, 65 
extended matrix, 193 
spectroscopic, 180 
found by quadrature, 429 
largest, 177, 182 
problem, 176, 376 
smallest, 200, 203 
Eigenvalues of differential operator, 
363 
Eigenvectors, 60, 177, 188 (see also 
Principal axes) 
orthoganality, 75, 88, 94, 104 
Einstein, 110 
Electric network: 
memory time, 260, 265, 291 


Index 


Electric network (cont'd): 
stability condition, 257 
Electronic computers, 123 
Elimination: 
method, 118 
rounding errors, 122- 
simultaneous, 143 
Ellipse, 82, 84 
Ellipsoid, 83, 84 
Empirical: 
function, differentiated, 321, 324, 
327, 342 
Laplace transform, 287 
End point: 
approximation, 478 
correction, 386, 414 
data, 419, 425 
error, 497 
Entire function, 441 
Equal sign, 244, 449 
Equation: 
biquadratic, 19 
characteristic, 58, 59 
complex coefficients, 43 
cubic, 6, 8, 11 
fourth order, 19 
Hamilton-Cayley, 61, 63, 67 
right side of, 119 
sixtic, 31 
Equations, algebraic, 5-48 
Equidistant data: 
harmonic analysis, 240 
polynomial interpolation, 307, 311, 
348, 352, 357 
Erroneous notions, 1, 2, 243, 451 
Error: 
analysis, tau method, 486, 493, 496, 
498, 499 
oscillations, 247 
of quadrature methods, 385, 388, 
389, 404, 412 
of rearranged series, 462 
of trigonometric interpolation, 242 
term, 468, 472 
Euler, 207, 248 
Evaluation of data, 343 
Evendetermined system, 151 
Even function, 216, 234, 237, 245, 259, 
405 
Excessive accuracy, 275, 287, 289 
Expansion: 
asymptotic, 483, 484 


533 


Expansion (cont'd): 
by recursions, 468 
by trigonometric interpolation, 232 
in Chebyshev polynomials, 454, 479 
in Laguerre functions, 297 
in Legendre polynomials, 287 
in reciprocal powers, 482 
in right half plane, 444 
orthogonal, by interpolation, 372 
ortho-normal, 213, 232 
theorem, Heaviside, 284 
Exponential: 
function, 390, 402, 425, 474 
integral, 444, 482 
Extrapolation, 308 


F 


FACTORABILITY of zero, 62 
Feed-back, 14 
Fejer, 210, 211, 225, 306, 451 
kernel, 211, 227 
Fidelity, increased in sigma smoothing, 
227 


Filter: 
analysis, 264, 265, 266 
Gibbs oscillations smoothed, 266 
ideal low-pass, 267 
Finite expansion, overdetermined, 465 
Fixed strip, 13 
Flexible coefficients, 449, 450, 469 
Focusing power, 210, 211, 271 
Forcing function, 119 
Formal expansion, 465 
Fourier, 207, 248 
analysis: 
applied to smoothing, 331 
equidistant data, 240, 333, 349 
coefficients: 
compared with interpolation co- 
efficients, 419 
order of magnitude, 217, 292, 333 
cosine series, 237, 245, 454 
functions as orthogonal functions, 
214, 362, 365 
integral, 248 
series, 208, 209 
arbitrary limits, 249 
differentiated, 219 
modified, 218 
uniqueness, 243 
versus Fourier integral, 254 


534 Index 
Fourier (cont’d): Harmonic: 
sine series, 235, 333 analysis, 207-304 


synthesis, 239 
transform, 183, 252, 294 
of discrete data, 269 
Frame of reference, 90, 93, 95, 99, 
110, 114, 360 
Fredholm, 50, 171, 379, 439 
Frequency, 250 
angular, 255, 268 
negative, 254 
response, 256, 258, 261, 264 
Frobenius, 50 
Function: 
as vector, 358 
bounded variation, 208, 257, 373 
even, 216, 234, 237 
odd, 216, 234, 235 
quadratically integrable, 215 
space, 358, 362, 363 
system: 
complete, 362 
orthogonal, 213, 231, 358, 364 
Fundamental polynomial, 274 


G 


Gauss, 305, 344, 367, 379 
differential equation, 367 
least-square polynomials, 344 

Gaussian quadrature, 396, 400-404 
engineering applications, 413 
error, 404, 405 
rounded-off zeros, 408, 410-412 

Geometric series, increased conver- 

gence, 482 

Gersgorin’s theorem, 181 

Gibbs oscillations, 217, 225, 266 

Green’s: 
function, 51 
identity, 363, 365, 366 

Gregory, 306 

Gregory-Newton interpolation, 307 
in infinite domain, 347 

Guiding line, 236 


H 


HALF lines, 309, 311, 313, 324 

Half weight, 239, 457 
Hamilton-Cayley equation, 61, 63, 67 
Hammer blow, 258, 259 


chart, 250 
equidistant data, 240 
components, 208, 251 
vibrations, 495 
Heaviside, 282, 283 
expansion theorem, 284 
Hermitian: 
matrix, 89, 190, 200 
polynomials, 370 
Hidden periodicities, 33, 183, 267 
Hilbert, 439 
Horner’s scheme, 12, 21 
Hypergeometric: 
function, 286, 289, 369, 370, 456 
series, 367, 368 


I 


IDEAL low-pass filter, 267 
Inadequate scaling, 161 
Incompatibility, 154, 158, 161 
Increased convergence, 264, 469, 476, 
481, 484, 490 
Independence, linear, 90, 96 
of powers, 458 
Indicial function, 284, 290, 299 
Infinite: 
range 346, 366 
series, 244, 480 
Infinitesimals, 171 
Initial conditions, 469 
Input-output relation, 255, 259 
Integration by parts, 420 
Interference of neighboring peaks, 185, 
186, 270 
Interpolation: 
Bessel’s formula, 311 
by Chebyshev polynomials, 245 
by cosine functions, 238 
by orthogonal polynomials, 371 
by polynomials, equidistant data, 
346, 350, 352 
by sine functions, 235 
Gregory-Newton, 307, 347 
Lagrangean, 397 
nonequidistant, 311, 372 
of Fourier transform, 263, 264 
of Laplace transform, 299 
parabolic, 35 
Stirling formula, 311 
trigonometric, 229 


Index 


Inverse: 
of matrix, 64, 79, 80, 116, 129 
of product, 76 
Inversion: 
of Laplace transform, 285, 288, 290, 
292 
of matrix, 118, 121, 130, 137, J41, 
145 
Inverted polynomial, 8, 9, 24, 295 
Iterative: 
cycle, 195, 196 
length of, 198 
solution, linear system, 189 


J 
Jacosl polynomials, 367, 451, 452 


K 


KEPLER, 82 
Kernel: 
Dirichlet, 209 
Fejer, 211 
of sigma process, 227 
Key values, subtabulation, 314 


L 


LAGRANGE, 5, 171, 207, 244, 397 
interpolation, 397 
Laguerre: 
expansion, 298 
functions, 297 
polynomials, 297, 370 
Laplace transform, 280-303 
and Fourier transform, 281 
and Taylor series, 295 
empirical, 287 
interpolation of, 299, 302, 303 
inversion of, 284, 285, 288, 292 
point-to-point correspondence, 282 
redundancy of, 300 
Largest: 
eigenvalue, 181 
root, 26, 29, 30 
Last: 
data, smoothed, 320 
ordinates, weights for, 323, 328, 341 
LC circuits, 303 
Least square: 
approximation, 212, 230, 315, 345 


535 


Least square (cont'd): 
parabola, 317, 321, 339 
polynomials, 344 
principle, 127, 305, 315 
Legendre, 305, 439 
polynomials, 285, 287, 369, 400, 421 
422 
Leibniz, 49, 490 
series, 491 
Leverrier, 144 
Limit concept, 451 
Limiting circle, 447 
Linear: 
differential equations, 464 
independence, 90, 96 
superposition, 91 
system: 
accuracy of solution, 197 
as eigenvalue problem, 192 
compatibility, 153-156 
deficient, 169, 170 
effect of noise, 167 
extremely skew-angular, 166 
homogenized, 191 
inadequate scaling, 161 
normalized, 162 
orthogonalized, 125, 163 
skewness, 161, 168 
solved by iterations, 189 
underdetermined, 151, 156, 159 
Line spectrum, 184, 269 
Local: 
averaging in sigma process, 227 
smoothing, 317, 339 
Lopsided weighting, 452 
Low-pass filter, ideal, 267 


M 


MACLAURIN, 438 
Mapping, C type, 256 
Matrices: 
and eigenvalues, 49-170 
biorthogonal, 74, 104 
commutative, 118 
Matrix, 52 
algebra, 57 
and quadratic surface, 81, 96 
complex, inverted, 137 
defective, 110 
diagonal, 75 
dominantly diagonal, 94 


536 


Matrix (cont’d): 
eigenvalues, 59, 60, 65, 69, 72, 168, 
180, 200, 203 
Hermitian, 89, 190, 200 
identity, 61, 63 
of lowest order, 110 
inversion, 121, 125, 141 
largest eigenvalue, 180 
nonsquare, 142 
numerically equivalent, 123 
operations, 53-57 
orthogonal, 87, 93, 96, 104 
and complex, 90 
orthogonalized, 123, 132, 163 
polynomial, 172, 173, 174 
principal axes, 68, 74, 76, 81, 86, 
106, 118 
recurrent, 175, 273, 346 
reduced, in size, 140 
smallest eigenvalue, 200, 203 
symmetric, 85, 114 
transformation: 
linear group, 114 
orthogonal group, 112 
triangular, 126, 140, 146 
inverted, 130, 146 
triangularized, 135, 136 
Maximum: 
by parabolic interpolation, 35, 271 
convergence, 453, 463 
error, 295, 496 
Measurements, average error, 316 
Mechanical quadrature, 380 
Memory time, 260, 265, 291 
Method of: 
exhaustion, 379 
forced vibrations, 496 
least squares, 212, 305 
moments, 22 
selected points, 504, 506 
Midpoint values, interpolated, 348 
Modulation, 220 
Moment of area, 327 
Movable strip, 13, 24, 48, 296, 322 
Multiple roots, 42, 94 
Multiplication: 
column by column, 56 
matrices, 39, 237, 238, 239, 294 


N 


NASCENT strip, 13 


Index 


Nearly singular system, 65, 80, 168 
Negative frequency, 254 
Network: 
analysis, 282 
ballistic, 301, 303 
LC, 303 
Newton, 82, 306 
Newton’s method, 10, 11, 17 
Noise, 149, 150, 167, 325, 338, 344 
Nonanalytical, 254, 325, 332 
Nonequidistant interpolation, 246, 311 
372 
Nonfactorability of zero, 57 
Nonintegrable functions, 224 
Nonuniform distribution of data, 246 
Norm, 231, 287 
Normal equations, 316, 346 
Normalization, 88, 104 
Numerical analysis, 4 
Numerical example: 
analytical extension, 444 
Bernoulli method, 26 
canonical polynomials, 470, 471, 
473, 475, 486 
coded matrix, 237, 238 
complex roots, 32, 41 
cubic equation, 8 
defective matrix, 107 
derivative of empirical function, 322, 
324, 326, 328, 330 
difference table, 308, 310 
eigenvalue analysis, 65 
Gaussian quadrature, 401, 403, 410, 
412 
Horner’s scheme, 15, 21 
interpolation, 308, 311 
local smoothing, 319 
matrix inversion, 122, 130, 132 
Newton’s method, 11 
orthogonalization by rotation, 165 
quadrature, 390, 392, 395, 418, 426, 
430, 433 
quartic equation, 20 
separation of exponentials, 276 
smoothing in the large, 338, 342 
square root of complex number, 503 
stability analysis, 45 
successive orthogonalization, 132 
tau method, 476, 481, 485, 488, 491, 
492 
telescoping of power series, 458, 462 
Numerically equivalent, 123, 133, 279 


Index 


O 


o for column by column multiplica- 
tion, 67 
Observations, glaring errors, 314, 338 
Odd functions, 216, 234, 235 
Operational viewpoint, 49 
Orthogonal: 
expansion, 362 
function system, 358, 362 
set, 213, 214, 362 
transformation, 110 
Orthogonality, 73, 213, 214, 231, 364, 
400 


Orthogonalization, 123, 132, 163 

Ortho-normal set, 88, 213, 215, 231 

Oscillatory error term, 468 

Overdetermination, 151, 157, 465, 466, 
470 

Oversmoothing, 340 


P 


PARAMETRIC method, 431, 491 
Parexic, 4 
Partial frame, 360 
Peaks, 185, 186, 270 
Periodic error, 495 
Periodicities, hidden, 33, 185, 267 
Perturbation, 144, 145, 146, 148, 280 
Planimeter, 380 
Polynomial, 24, 59, 172, 274 
expansion, well-convergent, 352 
Polynomials: 
Jacobi, 367, 451 
synthetic division, 24, 295 
Positive definite, 189 
Postmultiplication, 76 
Postulates of algebra, 57 
Power: 
expansion, beyond Taylor range, 
463, 486 
expansions, 438-507 
rapidly convergent, 439 
sums, 26, 48 
Powers, 458, 461 
Precision approximation, 426, 489 
Premultiplication, 75 
Principal axes, 68, 74, 76, 81, 86, 106 
(see also Eigenvectors) 
Problem of weighted moments, 274 
Projection, 231, 360 


537 


Pulse, 258, 260, 262 
response, 257, 258, 262 


Q 


QUADRATIC surface, 81, 86, 94, 101, 
104 
Quadrature: 
by differentiation, 419, 423, 424, 434 
by polynomials, 393 
Gaussian, 396, 400, 404, 408, 410, 
413 
mechanical, 380 
methods, 379-437 
Simpson’s formula, 381, 385, 391, 
392, 414 
trapezoidal rule, 380, 386, 391, 392 
Weddle’s rule, 396 
Quartic equations, 19, 20 


R 


RaDIUs, convergence, 440, 442 
Random: 

scatter, 318 

variations, 305 
Range: 

infinite, 366 

readjusted, 473 
Rayleigh, 439 
Rearrangement, 460 
Reciprocal: 

polynomial (see Inverted poly- 

nomial) 

radii, 37, 293, 442 
Reciprocity relations, 254 
Rectangular, 93 
Recurrence relations, 39, 173, 178, 179, 

374, 465, 479 

Recurrent matrix, 175, 273, 346 
Reference system: 

general, 90, 110, 114 

rectangular, 93, 95 

skew-angular, 96, 99, 125 
Relativity, 110 
Remainder, 14 
Renormalization of range, 455 
Representation, conjugate, 99 
Residual: 

of observations, 345 

test, 198 

vector, 148 


538 


Resolution power (see Separation 
power) 
Response, 255, 257 
Rigid coefficients, 451 
Roots of: 
cubic, 7, 15 
equation of higher order, 22, 30 
multiple, 42, 94 
near imaginary axis, 40 
quartic, 19 
Rounded-off zeros, 408 
Routh, 44 
Ruffini, 13 
Runge, 306, 348, 357 
Runge phenomenon, 348 


S 


SCALAR product, invariance, 114, 117 
Scaled matrix, 182, 190 
Scale factor, 7, 161 
Second: 
derivative, 327, 330, 333, 340, 342 
order differential operator, 366 
orthogonality, 377 
Selected points, 506 
Self-adjoint: 
boundary conditions, 363 
differential operator, 362 
vector system, 99 
Semiconvergent expansion, 438 
Separation: 
of exponentials, 272, 279 
power, 198, 270 
Shifted Chebyshev polynomials, 455, 
456 
Sigma: 
factors, 33, 220 
smoothing, 33, 225, 228, 267 
Signal, 255 
Simpson’s formula, 384, 385, 414, 417 
Sine integral, 487 
Singularity, 440 
Singular point, 493 
Sixtic equation, 31 
Skew-angular, 125 
Skewness, 161, 168 
Smoothing: 
by fourth differences, 316 
by truncation, 337 
in the large, 331, 342 
of Gibbs oscillations, 33, 225, 267 


Index 


Smoothing (cont’d.): 
parameter, 334 
Smoothness of trigonometric inter- 
polation, 235 
Solution of: 
differential equations, 433, 471 
linear systems, 189-198 
Space, many dimensional, 359 
Spectral lines, 184 
Spectroscopic analysis, 180 
Spectrum, continuous, 252 
Spherical subspace, 95 
Square: 
root, 485, 500 
wave, 225, 226 
Stability : 
analysis, 44—48 
condition, 257 
Stable mapping, 257 
Starting vector, 204 
Steady-state response, 256 
Stirling, 306 
functions, 310, 354 
interpolation, 311 
Strip, 13 
Sturm-Liouville, 364 
Sub-space, 95 
Surfaces, second order, 81 
Surplus equations, 465 
Sylvester, 50 
Symmetric matrix, 85, 114 
Symmetrization of matrix, 189, 190 
Synthetic division, 24, 296 
Systematic error, 497 


T 


TABULATION of: 
functions, 306 
principal axes, 69, 72 
Take-off measurements, 319 
Tau method, 464 
error estimation, 493 
examples, 474, 478, 482, 485, 489, 
492 
Taylor, 438 
coefficients by recursion, 441 
series, 295, 306, 452, 505 
and ultraspherical expansion, 452 
complex range, 448 
exponential integral, 445 
weighted, 476 


Index 


Telescoping, 440, 457 
by rearrangement, 460 
Tensor, 52 
Theorem, fundamental, 5 
Third power convergence, 241 
Time: 
lag, 267 
scale, normalized, 284 
Tracking data, 327 
Transfer function, 256 
Transformation: 
by reciprocal radii, 37, 293, 442 
linear, 114 
of coordinates, 90, 93 
of eigenvalue, 202 
of matrix, 112 
orthogonal, 96, 110 
to principal axes, 90, 101 
Transposition : 
identity, fundamental, 88 
of matrix, 70, 113, 116 
Trapezoidal formula, 381, 386, 387, 
389 
Trial vector, 174 
Trigonometric: 
functions, Chebyshev expansion, 350 
interpolation, 229, 234, 241, 244, 
294, 349 
series, nonintegrable functions, 224 
Truncation of: 
Chebyshev expansion, 351 
Fourier series, 335, 337, 341 
Twilight zone, 338 


U 


ULTRASPHERICAL polynomials, 
452 

Undetermined system, 151, 156, 159 
Uniform distribution, errors, 453 
Unit: 

circle, scanned, 32 

pulse, 258, 290 
Unsmooth function, 392, 393, 396, 403 


369, 


539 


Unsmoothness of polynomial inter- 
polation, 235 
Upper bound: 
eigenvalue, 181, 191 
error, 462 
V 


VARIATION of constants, 487 
Vector: 
analyzed, 97 
covariant, 101 
contravariant, 101 
dual representation, 102 
residual, 148 
Vibration, 495 
Viète, 49 


W 


WEDDLE’s rule, 396 
Weierstrass, 50, 347 
theorem, 347 
Weighted: 
moments, 274, 407 
orthogonality, 366 
Weight factor one half, 239 
Weight factors, sigma process, 220 
Weighting of: 
asymptotic expansion, 484 
partial sums, 476 
Weights, quadrature: 
arbitrary zeros, 398, 407 
Gaussian, 378, 401, 403 


Z 


ZERO: 
factorability, 64 
nonfactorability, 57 
Zeroing out of elements, 120 
Zeros: 
Chebyshev polynomials, 348 
Gaussian, 402 
rounded-off, 408 
orthogonal polynomials, 372 


MATHEMATICS 


APPLIED ANALYSIS 


Cornelius Lanczos 


This is a basic text for graduate and advanced undergraduate study in those areas of 
mathematical analysis that are of primary concern to the engineer and the physicist, 
most particularly analysis and design of finite processes which approximate the 
solution of an analytical problem. The work comprises seven chapters: 


Chapter I (Algebraic Equations) deals with the search for the roots of algebraic 
equations encountered in vibration and flutter problems and in problems of 
static and dynamic stability. Useful computing techniques are discussed, in 
particular the Bernouilli method and its ramifications. 


Chapter II (Matrices and Eigenvalue Problems) is devoted to a systematic 
development of the properties of matrices, especially in the context of 
industrial research. 


Chapter III (Large-Scale Linear Systems) discusses the “spectroscopic 
method” of finding the real eigenvalues of large matrices and the correspond- 
ing method of solving large-scale linear equations, as well as an additional 
treatment of a perturbation problem, and other topics. 


Chapter IV (Harmonic Analysis) deals primarily with the interpolation aspects 
of the Fourier series, and its flexibility in representing empirically given 
equidistant data. 


Chapter V (Data Analysis) deals with the problem of reduction of data and of 
obtaining the first and even second derivatives of an empirically given 
function—constantly encountered in tracking problems and in curve-fitting 
problems. Two methods of smoothing are discussed: smoothing in the small 
and smoothing in the large. 


Chapter VI (Quadrature Methods) surveys a variety of quadrature methods 
with particular emphasis on Gaussian quadrature and its use in solving 
boundary value problems and eigenvalue problems associated with ordinary 
differential equations. 


Chapter VII (Power Expansions) discusses the theory of orthogonal function 
systems, in particular the “Chebyshev polynomials.” 


This unique work, perennially in demand, belongs in the library of every engineer, 
physicist, or scientist interested in the application of mathematical analysis to 
engineering, physical and other practical problems. 


Unabridged Dover (1988) republication of the edition published by Prentice-Hall, 
Inc., Englewood Cliffs, New Jersey, 1956. 


$22.95 USA PRINTED IN THE USA 


ISBN- 13: 978-0-486-65656-4 
ISBN- 10: 0-486-65656-X 


| | lif 


917804861656564 
SEE EVERY DOVER BOOK IN PRINT AT WWW.DOVERPUBLICATIONS.COM 


